aiddata / gcdf-geospatial-data

Repository for AidData's Geospatial Global Chinese Development Finance Dataset (GeoGCDF)
https://aiddata.org/china
Other
32 stars 8 forks source link

Query reuse / caching #8

Open sgoodm opened 3 years ago

sgoodm commented 3 years ago

Current implementation:

My current implementation of this was very basic and intended solely to avoid having to requery directions links to retrieve SVG path data. A path to a feature_df.csv from a previous run (containing the SVG path data and any initial processing for any links) can be provided and load in place of reprocessing any features available from the features_df.csv. This saves a lot of time with the current implementation of querying data for directions links (See: #1 ).

Issue:

The current implementation would fail if the input data source changes (changing the unique ID assigned to project-link or feature combinations). It also can only utilize data from a single previous run.

Possible solutions:

This really depends how far we want to go to deal with this. If querying directions links was faster, I would likely suggest we forego this issue and just process the data freshly each build. That said, I could imagine there being cases where accessing cached data could be useful (e.g., OSM features changed and we want to use a specific version from an old build).

Ultimately I will likely leave this until the next update is needed and see what will be useful in practice based on data update patterns.