Closed jhpoelen closed 2 months ago
As of Preston 0.8.5, remotes are queried for "preston.tar.gz" in addition to the "preston-[a-f0-9]{2}.tar.gz" patterns.
For example usage, see
Poelen, J. H. (2024). A biodiversity dataset graph: Biological Associations in TaxonWorks hash://sha256/e4a47c067d6c125da60c9a1b92b5eecdea539cb8666cd3aed99db347ae5b8ed0 hash://md5/686007de79cc2a49ab23fd3debe56e3f (0.3) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11151783
this enables stuff like:
preston clone --remote https://zenodo.org/records/11151783/files
which would use preston.tar.gz to clone the dataset.
Currently, Preston allows for discovering resources in tar.gz files on remotes.
for instance,
when retrieving content associated with -
hash://sha256/2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
using some remote, preston digs out the content from
preston-2a.tar.gz
if exists. The naming convention is preston-[first two hash ids characters].tar.gz .
The reason for this feature is to bundle resources to keep the file count low. For instance, when a remote only provides up to 100 files (like Zenodo), resources can be bundled into these tar balls.
Suggest to support naming conventions:
preston-2.tar.gz (first content hash character)
as well as
preston.tar.gz (all content to be found in this archive).