AMP-SCZ / lochness

Download your data to a data lake.
Other
3 stars 1 forks source link

fix: mediaflux checksum #126

Closed kcho closed 1 year ago

kcho commented 1 year ago

We expected unimelb-mf-download to check the checksum of the local data compared to the source data on Mediaflux. But when a file exist in the same file path given to the --out argument to the unimelb-mf-download, it completely skips downloading the file regardless of the content of the file.

The following lines reproduces the bug

# create an empty file
touch test.zip

unimelb-mf-download \
    --mf.config mflux.cfg \
    --out ./ \
    --csum-check \
    /projects/proj-5070_prescient-1128.4.380/PrescientXX/test.zip

Returns

Skipped asset 1264700582: '/projects/proj-5070_prescient-1128.4.380/PrescientXX/test.zip'. Already exists.

This PR extracts, saves, and uses the CRC32 checksum from the unimelb-mf-check to re-download changed files from Mediaflux.