aiddata / gcdf-geospatial-data

Repository for AidData's Geospatial Global Chinese Development Finance Dataset (GeoGCDF)
https://aiddata.org/china
Other
32 stars 8 forks source link

Process for updating existing projects #18

Closed sgoodm closed 3 years ago

sgoodm commented 3 years ago

How can we rerun the build to only process/update projects which have changed?

Should this use some change detection by comparing to a specified previous build, or require a separate input which consists of only the projects to be updated?

Another option is to break out the GeoJSON creation process in order to allow it to be called from a separate script dedicated to running updates, or even manually. E.g., we know project "123" was updated so we run python update.py 123 which creates a new GeoJSON. Then we also break out the process to create the global GeoJSONs and run that based purely on the contents of the folder containing individual GeoJSONs.

sgoodm commented 3 years ago

I've added some additional functionality which uses the config to set the build to "update_mode" and specify a list of project IDs to update, and a previous build's timestamp which the updates will be part of.

For example, with update_mode = True, update_ids = [123], and update_timestamp = 2020_10_31_23_59, the build will utilize the input data (which has been updated to reflect any changes), and will process only project ID 123. Once project 123 is processed, all other projects from the build with timestamp 2020_10_31_23_59 will be copied into the updated build.