biobricks-ai / biobricks

BioBricks makes loading data from biological datasets and databases easy. Python and R interfaces, data version control, and an API for pulling datasets that have been converted to easy-to-use formats.
https://docs.biobricks.ai
MIT License
5 stars 2 forks source link

Data downloads are too big for clinvar download #11

Closed svsuresh closed 8 months ago

svsuresh commented 9 months ago

I see that just to set up clinvar, as outlined in tutorial, user needs to download 1 GB. It is helpful to denote how much HDD space is needed for each reference. Such big files for small files like clinvar (clinvar.vcf) surprised me and also huge files with random numbers are not welcome, for my purpose. I stalled installation there. Here is the screenshot for setting clinvar:

clinivar-bb

tomlue commented 9 months ago

It is true that many biobricks are large. The clinvar brick collects data from https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited, which is a bit larger than 1gb.

Do you have a suggested solution for this issue?

huge files with random numbers are not welcome The biobricks system works by creating files with filenames based on content hashes, that is the reason for the seemingly random file names, this comes from our dependency on dvc. In practice, you shouldn't need to worry about these names in your work.

tomlue commented 8 months ago

status.biobricks.ai now makes the size of the assets clear.