Closed markbrough closed 2 years ago
Maybe the files are too large to process?
Yes, that’s right. The files are larger than 50MB, which is set as the hard file size limit just here: https://github.com/codeforIATI/IATI-Stats/blob/10099818cfa4096e91c1e41d24aeb9508b23e193/statsrunner/loop.py#L66-L69
So they’re recorded as “too large”. See e.g.: https://github.com/codeforIATI/IATI-Stats-public/blob/9bf451ba/current/aggregated-file/onl/onl-activity.xml/toolarge.json
I’m struggling to figure out what the official line on this is. The registry asks publishers to limit their files to 40MB: https://github.com/IATI/ckanext-iati/blob/0c95c911/ckanext/iati/theme/templates/package/snippets/package_basic_fields.html#L146
The registry refresher appears to set the limit at 60MB: https://github.com/IATI/ckanext-iati/blob/6ec91098/ckanext/iati/archiver.py#L34-L36
I can’t seem to find anything in the publishing guidance on iatistandard.org about this.
Just to complicate this a bit further…
The fie sizes shown on the IATI Registry suggest these files are actually smaller than 50MB:
That’s because the file sizes shown are not actually in megabytes (i.e. 106 bytes), but are instead in mebibytes (i.e. 220 bytes).
As far as analytics is concerned, I think it’s a bug that this doesn’t show up as a data quality issue: https://analytics.codeforiati.org/publisher/onl.html#h_dataquality
I also wonder if the file size limit should be increased to 60MB, to be consistent with the registry refresher limit. @markbrough what do you think?
I’ve created #17 and codeforIATI/analytics#60, which should hopefully address this.
This looks to be resolved now.
Oxfam Novib publishes three files, two of which are around 50MB: https://analytics.codeforiati.org/publisher/onl.html
The files pass all the checks on the validator: https://iativalidator.iatistandard.org/organisation/onl
However, according to Code for IATI analytics, there are no activities. Maybe the files are too large to process?