codeforIATI / IATI-Stats

Python application for generating JSON stats files from IATI data
https://stats.codeforiati.org
Other
0 stars 1 forks source link

Oxfam Novib data not appearing #16

Closed markbrough closed 2 years ago

markbrough commented 3 years ago

Oxfam Novib publishes three files, two of which are around 50MB: https://analytics.codeforiati.org/publisher/onl.html

The files pass all the checks on the validator: https://iativalidator.iatistandard.org/organisation/onl

However, according to Code for IATI analytics, there are no activities. Maybe the files are too large to process?

andylolz commented 3 years ago

Maybe the files are too large to process?

Yes, that’s right. The files are larger than 50MB, which is set as the hard file size limit just here: https://github.com/codeforIATI/IATI-Stats/blob/10099818cfa4096e91c1e41d24aeb9508b23e193/statsrunner/loop.py#L66-L69

So they’re recorded as “too large”. See e.g.: https://github.com/codeforIATI/IATI-Stats-public/blob/9bf451ba/current/aggregated-file/onl/onl-activity.xml/toolarge.json

I’m struggling to figure out what the official line on this is. The registry asks publishers to limit their files to 40MB: https://github.com/IATI/ckanext-iati/blob/0c95c911/ckanext/iati/theme/templates/package/snippets/package_basic_fields.html#L146

The registry refresher appears to set the limit at 60MB: https://github.com/IATI/ckanext-iati/blob/6ec91098/ckanext/iati/archiver.py#L34-L36

I can’t seem to find anything in the publishing guidance on iatistandard.org about this.

andylolz commented 3 years ago

Just to complicate this a bit further…

The fie sizes shown on the IATI Registry suggest these files are actually smaller than 50MB: Screenshot 2021-11-12 at 11 05 29

That’s because the file sizes shown are not actually in megabytes (i.e. 106 bytes), but are instead in mebibytes (i.e. 220 bytes).

andylolz commented 3 years ago

As far as analytics is concerned, I think it’s a bug that this doesn’t show up as a data quality issue: https://analytics.codeforiati.org/publisher/onl.html#h_dataquality

I also wonder if the file size limit should be increased to 60MB, to be consistent with the registry refresher limit. @markbrough what do you think?

andylolz commented 3 years ago

I’ve created #17 and codeforIATI/analytics#60, which should hopefully address this.

andylolz commented 3 years ago

17 is now fixed, so this should be resolved in tomorrow’s update.

andylolz commented 2 years ago

This looks to be resolved now.