CONP-PCNO / conp-dataset

:open_file_folder: A DataLad dataset for CONP
http://conp.ca
MIT License
19 stars 33 forks source link

Modify crawler behaviours to actually download the files when crawling so MD5hash can be added to the git annex URL #642

Closed cmadjar closed 3 years ago

cmadjar commented 3 years ago

Purpose

The crawler code for Zenodo and OSF need to be updated so that files get downloaded before they get annexed. Doing so will allow the storage of the MD5sum hash can be seen in the git annex object and we can later on link easily processed files in the execution records Mandana is working on.

cmadjar commented 3 years ago

The code has already been modified in https://github.com/CONP-PCNO/conp-dataset/pull/627/files#diff-4db1ecdec0c56275d97bd8a7d85769c0ad610d06cab82ef10124fa2cf6c7fab0L81 to download the files. All that is left is recrawling the datasets.

Here is a list of datasets that do not have the MD5 hash in the git annex object URL:

joeyzhou98 commented 3 years ago

@cmadjar is this ok to close?

cmadjar commented 3 years ago

Yep :)