Closed lukasjelonek closed 11 months ago
bakrep download -t bakrep-export.tsv -d /tmp/my-download-dir -m tool:bakta,filetype:gff3
bakrep download -t bakrep-export.tsv -d /tmp/my-download-dir -a
bakrep download -l SAMD12345 -d /tmp/my-download-dir -a
bakrep download -l SAMD12345,SAMD77777 -d /tmp/my-download-dir -a
To avoid resume of failed/canceled downloads the commandline tool should track persistently which datasets are already downloaded. On resume/next download it should identify the missing datasets and continue to only download these.
At the moment we will provide a naive download mechanism that does a lookup of files for each dataset via the bakrep rest api, filters the required files and saves them to disk.
I expect that the users do not want to download everything into a single directory. Depending on the volume it may be better to create subsets (subdirectories) with as much as n
datasets. This may be computed dynamically depending on the tsv input or statically (the cli provides a hard coded directory schema, maybe the schema we use for storage internally).
In a first version downloading the download may be single threaded. In future versions this can be updated to use asyncio or multithreading.
The cli shall display the progress in a simple ${processed}/${total}
schema that is updated everytime a dataset-download is finished. Additionally it may show the current dataset id that is downloaded. To notify the user about continuing the download, an additional line that state the number of already downloaded datasets should be added.
Manages all datasets that need to be downloaded.
Downloads a dataset to a provided location
None
Validates the checksums of all files for a dataset
None
The tool is now available at https://github.com/ag-computational-bio/bakrep-cli.
Finished
It should be possible to download a collection of datasets to the users computer via a commandline tool. It should be able to only download certain files for the datasets, e.g. only the protein fasta files. It should be possible to use the exported tsv to download all datasets present in the tsv file.