codeforIATI / iati-data-dump

📷 A daily snapshot of all IATI data on the IATI Registry
https://iati-data-dump.codeforiati.org
GNU General Public License v3.0
2 stars 1 forks source link

Archive/Cache of Data Dump #11

Open eastcoasting opened 1 month ago

eastcoasting commented 1 month ago

Hi - Thank you for this valuable service, I am using the data dump to pull source data for further processing and analysis. When the zip location was switched over (https://github.com/codeforIATI/iati-data-dump/pull/10), the historical archive of data appears to have been lost as this was preserved in the gitlab git history. Is this preserved but not publicly available?

This historical data has been useful in producing fallbacks when files are offline (e.g., Norad's activity XML file link switched recently, but has not caught up), or reconstructing transaction data for organizations that replace values in place (e.g., many of Germany's expenditure transactions). While some of these are obviously issues on the government data provider side, I'd be interested in trying to preserve this resource.

andylolz commented 2 weeks ago

Hi @eastcoasting, apologies for the delayed response and thanks for your interest in this project.

When the change in #10 was made, this project effectively changed hands. As such, you’re best off raising this issue here: https://github.com/OpenDataServices/iati-data-dump-2

(It’s a bit of a muddle, because the code for the static site still comes from this repo. I reached out to the new owners when I saw your message, but to no avail. I’ve also updated the footer of the site, to try and clarify this a bit.)

I’m afraid I’m not able to help more than that. I agree that the git repo was a useful thing, but unfortunately it outgrew the free limits of GitHub and then of GitLab.

odscjames commented 6 days ago

Hi,

So I'm afraid with the new service we are unable to offer a full historical record. This is in keeping with the IATI Secretariat's data use policy. This recognises that a publisher may need to retract a piece of data published in error and thus when a publisher wishes to remove a piece of data our tools respect that. https://iatistandard.org/en/data-removal/

In regards to your first use case of servers going down, we do recognise there is a valid case for keeping old data then. Some of our tools already do this (as outlined in the above page). Our new Bulk Data Service will offer files for 3 days after a server error. Does this help? Info at https://www.iaticonnect.org/group/9/topic/preview-feedback-updates-unified-platform-data-pipeline

If you have any further questions you can contact us at https://iatistandard.org/en/guidance/get-support/