Open seustachi opened 10 years ago
We agreed a solution for this in a recent CPS planning call, but I'm not sure that it is documented anywhere except our notes. Leaving this one open and prioritizing it until we get a documented plan of how to do it.
Current situation :
We import into CPS many indicators values from ScraperWiki (SW). SW is publishing files on CKAN, on a daily basis. We call the main one "ScaperWiki dataset", but there are others.
The published file contains all the values, for all the countries, of a quite large given list of indicators.
Everything is republished everyday, SW does not perform a diff from the previous publication.
CPS tries to import all the values every time we try to perform an import of a file. There is no simple way, from CPS PoV, to know what is new and what is not. So we try to write the database with every single value, just silently ignoring the errors.
The state of CPS at the end of the process is correct. New data are there. Old data are not getting duplicated. But we have 2 issues:
Most of the indicators are not updated more than once a year. Processing them everyday makes no sense.
What I recommend :
Split the content of the file following 2 rules :
If we agree on that, what do we need ?
For the split into groups of indicators, we need the data team contribution
For 2, this is just technical coordination between SW, CKAN and CPS. So, namely, Dragon, CJ and Samuel.