biobricks-ai / biobricks

BioBricks makes loading data from biological datasets and databases easy. Python and R interfaces, data version control, and an API for pulling datasets that have been converted to easy-to-use formats.
https://docs.biobricks.ai
MIT License
5 stars 2 forks source link

Pushing a single directory (e.g., `brick/`) with `dvc` then pulling fails #16

Closed zmughal closed 4 months ago

zmughal commented 6 months ago

First run:

$ dvc push -r s3.biobricks.ai build

Then:

$ biobricks install uniprot-kg
2024-01-08 11:12:21 | INFO: getting latest version of https://github.com/biobricks-ai/uniprot-kg
2024-01-08 11:12:21 | INFO: running checks on brick
2024-01-08 11:12:22 | INFO: git clone https://github.com/biobricks-ai/uniprot-kg biobricks-ai/uniprot-kg/d180c6e6e3537ef93ff9902d189e19c9e3af00b6 in /mnt/ssd/biobricks
2024-01-08 11:12:22 | INFO: adding brick to dvc cache
2024-01-08 11:12:22 | INFO: setting up credentials for dvc.biobricks.ai
2024-01-08 11:12:23 | INFO: discovering brick assets dvc.biobricks.ai
2024-01-08 11:12:24 | INFO: pulling brick assets
Collecting                                                                                                                                                              |0.00 [00:00,    ?entry/s]
Fetching
ERROR: unexpected error - failed to load directory ('93', '61da160c0d2f37fba41ec102ff2c8e.dir'): [Errno 2] No such file or directory: '/mnt/ssd/biobricks/cache/files/md5/93/61da160c0d2f37fba41ec102ff2c8e.dir'

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-01-08 11:12:25 | INFO: https://github.com/biobricks-ai/uniprot-kg#d180c6e6e3537ef93ff9902d189e19c9e3af00b6 succesfully downloaded to BioBricks library.

which is the MD5 for the download directory that was not pushed at the time.

tomlue commented 4 months ago

Starting from version 0.3.1, the biobricks PyPI CLI tool has undergone changes regarding its dependency on DVC. Specifically, the tool no longer relies on DVC for data retrieval. Instead, it directly handles the download of files specified in the repository's dvc.lock file, limiting the scope to files that reside within the brick/ path prefix.

This update addresses the issue reported where the user encountered an error during the biobricks install uniprot-kg command. The error was due to the CLI attempting to fetch a directory with an MD5 hash of 61da160c0d2f37fba41ec102ff2c8e, which had not been pushed to the DVC remote at that time.

With the latest version, such issues should be resolved as the CLI bypasses the DVC fetch process and directly downloads the necessary files. Should there be any further issues or if assistance is required, please feel free to reach out for support.