fair-research / bdbag

Big Data Bag Utilities
https://fair-research.org
Apache License 2.0
49 stars 23 forks source link

New file hashes for existing manifest entries generated from remote-file-manifests don't get updated in bags #34

Closed airswing closed 4 years ago

airswing commented 4 years ago

When I try the examples in the CLI Guide and run:

bdbag ./test_bag/ --update --remote-file-manifest ./test-fetch-manifest.json
bdbag ./test_bag/ --resolve-fetch all
bdbag ./test_bag/ --validate full

The hash for at least the first example file isn't matching the downloaded file. The hashes in the documentation should be updated.

More importantly, I notice that when you provide the wrong or outdated SHA256 file hash in the remote file manifest and try to update the hash after, it doesn't update the hash in manifest-sha256.txt. It says to remove the data files associated with the mismatch. I do that, then try to update the hashes again with --update --remote-file-manifest ./test-fetch-manifest.json, and when I do a --resolve-fetch all followed by --validate full, the hash still hasn't updated and still fails to validate.

I tried adding a new file to the test-fetch-manifest.json and ran --update --remote-file-manifest ./test-fetch-manifest.json and noticed the new file was added as expected to the manifest-sha256.txt as well as fetch.txt.

My expectations as a user are that when you run an --update --remote-file-manifest ./manifest.json, the manifest-*.txt in the bdbag should be overwritten when noticing pre-existing files have new hashes.

mikedarcy commented 4 years ago

Thanks for reporting this. I was able to reproduce it. Your expectations are indeed how it should behave and there is clearly a bug here. I will get to work on it and also update the documentation to correct the outdated file entry.

mikedarcy commented 4 years ago

Fixed in release 1.5.6.