dvolgyes / zenodo_get

Zenodo_get: Downloader for Zenodo records
GNU Affero General Public License v3.0
136 stars 21 forks source link

KeyError: 'size' when dowloading zenodo dataset #19

Closed aladinor closed 1 year ago

aladinor commented 1 year ago

Hi everyone,

I am trying to download a zenodo dataset. I am using zenodo_get command as follows,

zenodo_get 10.5281/zenodo.8374585

but then a KeyError suddenly appears. Am I doing something wrong? Is there any examples that people can use as guidance?

Traceback (most recent call last):
  File "/home/alfonso/mambaforge/envs/quetame/bin/zenodo_get", line 8, in <module>
    sys.exit(zenodo_get())
             ^^^^^^^^^^^^
  File "/home/alfonso/mambaforge/envs/quetame/lib/python3.11/site-packages/zenodo_get/zget.py", line 297, in zenodo_get
    total_size = sum(f['size'] for f in files)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alfonso/mambaforge/envs/quetame/lib/python3.11/site-packages/zenodo_get/zget.py", line 297, in <genexpr>
    total_size = sum(f['size'] for f in files)
                     ~^^^^^^^^
KeyError: 'size'

Cheers,

Alfonso

tcompa commented 1 year ago

Hi there, I'm not a mantainer here but this is most likely due to a recent upgrade of Zenodo (see e.g. https://help.zenodo.org/docs/about/whats-changed).

We fixed a similar issue in our project (see https://[github.com/fractal-analytics-platform/fractal-tasks-core/pull/568/files](https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/568/files)), where we had to change the name of some property to align with the new Zenodo.

Given the specification for the new Zenodo API responses (I couldn't find them right away, but I guess they should be available somewhere), it would be possible to update zenodo-get accordingly.

dvolgyes commented 1 year ago

Thanks for the error report and also @tcompa for the reference. I am a bit busy, 3 papers to revise this week, so I will try to find some time to fix it, but probably it will be next week. (Sorry, the whole code was just a weekend project during my phd, and proper refactoring is due for several years now, so maintenance is a bit fragile.)

dvolgyes commented 1 year ago

I pushed a quick fix, not too tested, but suppose to work (also bumped to python 3.12 compatibility, it seems to work with conda)

dvolgyes commented 1 year ago

Push: both to git and pypi, so 1.5 suppose to work now.

tcompa commented 1 year ago

Thanks! I confirm that version 1.5 works for my (admittedly very simple) use:

zenodo_get 8287221
Title: hiPSC 3D immunofluorescence images, tiny test set
Keywords: immunofluorescence, high content imaging, image analysis
Publication date: 2023-08-27
DOI: 10.5281/zenodo.8287221
Total size: 25.7 MB

Link: https://zenodo.org/record/8287221/files/20200812-CardiomyocyteDifferentiation14-Cycle1_B03_T0001F002L01A01Z01C01.png   size: 6.4 MB
100% [..........................................................................] 6695455 / 6695455
Checksum is correct. (a3b0be2af486e08d1f009831d8656b80)

Link: https://zenodo.org/record/8287221/files/MeasurementData.mlf   size: 0.0 MB
100% [................................................................................] 1653 / 1653
Checksum is correct. (08898b37193727874b45c65a11754db9)

Link: https://zenodo.org/record/8287221/files/MeasurementDetail.mrf   size: 0.0 MB
100% [................................................................................] 1183 / 1183
Checksum is correct. (5fce4ca3e5ebc5f5be0b4945598e1ffb)

Link: https://zenodo.org/record/8287221/files/fractal_example_workflow.json   size: 0.0 MB
100% [................................................................................] 2308 / 2308
Checksum is correct. (6ec46cf7bc434ca0ef059605708bc82c)

Link: https://zenodo.org/record/8287221/files/20200812-CardiomyocyteDifferentiation14-Cycle1_B03_T0001F002L01A01Z02C01.png   size: 6.4 MB
100% [..........................................................................] 6726558 / 6726558
Checksum is correct. (f1e0d50a1654ffd079504a036ff4a9e3)

Link: https://zenodo.org/record/8287221/files/20200812-CardiomyocyteDifferentiation14-Cycle1_B03_T0001F001L01A01Z01C01.png   size: 6.4 MB
100% [..........................................................................] 6751304 / 6751304
Checksum is correct. (41c5d3612f166d30d694a6c9902a5839)

Link: https://zenodo.org/record/8287221/files/20200812-CardiomyocyteDifferentiation14-Cycle1_B03_T0001F001L01A01Z02C01.png   size: 6.5 MB
100% [..........................................................................] 6775326 / 6775326
Checksum is correct. (3aa92682cf731989cf4d3e0015f59ce0)
All files have been downloaded.
sambugu commented 1 year ago

oh, well. it looks like zenodo have now reverted back to 'key' & 'size' inplace of 'filename' & 'filesize'

dvolgyes commented 1 year ago

Wonderful, A for stable API... I finish breakfast then I will make a new version that accepts both, I doubt this was the last time they changed it.

dvolgyes commented 1 year ago

I pushed 1.5.1 to git and pypi, it should be compatible with both variants, we will see. (Remark: they also reverted the hash value, so it seems the full API was reverted which is rare. But none of the changes were documented here: https://developers.zenodo.org/#changes )

@sambugu Could you check if it works for you?

sambugu commented 1 year ago

I pushed 1.5.1 to git and pypi, it should be compatible with both variants, we will see. (Remark: they also reverted the hash value, so it seems the full API was reverted which is rare. But none of the changes were documented here: https://developers.zenodo.org/#changes )

@sambugu Could you check if it works for you?

ok. just replaced zget with the new script. everything seems to work ok. thanks !

aladinor commented 1 year ago

Thanks everyone for your help