dvolgyes / zenodo_get

Zenodo_get: Downloader for Zenodo records
GNU Affero General Public License v3.0
130 stars 21 forks source link

`keep` not working #29

Open IgnacioHeredia opened 3 months ago

IgnacioHeredia commented 3 months ago

Unless I'm missing something very obvious, I don't think --keep is working as expected.

Reproduce:

  1. download record
  2. modify file
  3. download record with --keep

The file seems to be redownloaded again, ignoring keep and overwriting my changes. I would expect that the second time nothing is downloaded.

-k : keep files: it will keep files with invalid md5 checksum. The main purpose is debugging.

Version 1.6.1.

Thanks!

❯ zenodo_get -r 5211721 --keep
Title: [K2(bimpm)NiMe2]2
Keywords: 
Publication date: 2021-08-17
DOI: 10.5281/zenodo.5211721
Total size: 6.6 MB

Link: https://zenodo.org/records/5211721/files/Compound_6.cif   size: 2.7 MB
100% [..........................................................................] 2691377 / 2691377
Checksum is correct. (9ad8e7b4e58d52b857e83b17443393c6)

Link: https://zenodo.org/records/5211721/files/Compound_6_1H NMR.txt   size: 599.5 kB
100% [............................................................................] 599527 / 599527
Checksum is correct. (2c0554349547458ed42fa9263b93ec4c)

Link: https://zenodo.org/records/5211721/files/HSQC_Compound_6.mnova   size: 1.9 MB
100% [..........................................................................] 1873091 / 1873091
Checksum is correct. (adc9639d260a04a256aaa68faf1b5703)

Link: https://zenodo.org/records/5211721/files/Compound_6_13C NMR.mnova   size: 366.4 kB
100% [............................................................................] 366381 / 366381
Checksum is correct. (58762e3204b21e4cb58b7a0c323d7705)

Link: https://zenodo.org/records/5211721/files/Compound_6_13C NMR.txt   size: 720.5 kB
100% [............................................................................] 720539 / 720539
Checksum is correct. (3c8f62aa64a93adccb8b603351f81685)

Link: https://zenodo.org/records/5211721/files/Compound_6_1H NMR.mnova   size: 343.8 kB
100% [............................................................................] 343824 / 343824
Checksum is correct. (241a3a554bce4d6e2a9c8b0781436178)
All files have been downloaded.

~/misc/test-zenodo
❯ echo "update content" > Compound_6_1H\ NMR.txt

~/misc/test-zenodo
❯ zenodo_get -r 5211721 --keep                  
Title: [K2(bimpm)NiMe2]2
Keywords: 
Publication date: 2021-08-17
DOI: 10.5281/zenodo.5211721
Total size: 6.6 MB

Link: https://zenodo.org/records/5211721/files/Compound_6.cif   size: 2.7 MB
Compound_6.cif is already downloaded correctly.

Link: https://zenodo.org/records/5211721/files/Compound_6_1H NMR.txt   size: 599.5 kB
100% [............................................................................] 599527 / 599527
Checksum is correct. (2c0554349547458ed42fa9263b93ec4c)

Link: https://zenodo.org/records/5211721/files/HSQC_Compound_6.mnova   size: 1.9 MB
HSQC_Compound_6.mnova is already downloaded correctly.

Link: https://zenodo.org/records/5211721/files/Compound_6_13C NMR.mnova   size: 366.4 kB
Compound_6_13C NMR.mnova is already downloaded correctly.

Link: https://zenodo.org/records/5211721/files/Compound_6_13C NMR.txt   size: 720.5 kB
Compound_6_13C NMR.txt is already downloaded correctly.

Link: https://zenodo.org/records/5211721/files/Compound_6_1H NMR.mnova   size: 343.8 kB
Compound_6_1H NMR.mnova is already downloaded correctly.
All files have been downloaded.
serasset commented 3 months ago

My understanding of the --keep option is :

Without --keep (normal behaviour)

  1. download a file from zenodo
  2. check md5
  3. if md5 mismatch delete the downloaded file (do not KEEP it)

With --keep

  1. download a file from zenodo
  2. check md5
  3. if md5 mismatch KEEP the downloaded file

What you do is modifying the LOCAL file, hence zenodo_get sees it is not conform to zenodo anymore at step 1, so it downloads it and proceeds with the other steps.

the keep option has nothing to do with KEEPING the already downloaded files when they have been modified...

dvolgyes commented 1 month ago

@serasset That's the correct interpretation, I meant it as "if there is a checksum mismatch, then whether or not keeping the "damaged/mismatched" file.

@IgnacioHeredia So you want to avoid downloading files which are already downloaded? I will look into it, but honestly, I wrote this piece during my phd in a weekend out of annoyance, that reflect code quality and design too, so every time I need to recall the details. I am quite busy, but I will see what I can do.

IgnacioHeredia commented 1 month ago

No worries @dvolgyes , it's not a pressing issue. Thanks for the tool!