codeforIATI / iati-ideas

💡 Ideas for new codeforIATI projects and blogs
https://ideas.codeforiati.org
0 stars 0 forks source link

Create and host a (horizontally scalable) IATI Registry Refresher #7

Closed andylolz closed 5 years ago

andylolz commented 5 years ago

The tool that’s commonly used by IATI services for downloading all IATI data is the IATI Registry Refresher. It’s super simple – it makes a list of every dataset, then loops over that list, downloading them.

The original IATI Datastore doesn’t use this. Instead, it does everything in a task-based way. Tasks (e.g. "download this dataset") get queued up, and whenever resource is available, a task is dequeued and run. This means it is horizontally scalable, which is really handy if we’re looking to scale up as the amount of published data increases.

It also respects ETags. So it only downloads a dataset if the host says it has changed. Popular IATI hosting service AidStream uses ETags, so it is worth honouring them. It also has a pretty thorough test suite, which is cool.

I’m interested in taking these ideas from the datastore to make a service with the following features:

andylolz commented 5 years ago

This (approximately) exists: https://github.com/codeforIATI/parallel-registry-refresher