aiddata / gcdf-geospatial-data

Repository for AidData's Geospatial Global Chinese Development Finance Dataset (GeoGCDF)
https://aiddata.org/china
Other
32 stars 8 forks source link

Include recipient_region attribute in geojson? #28

Open cc50liu opened 1 year ago

cc50liu commented 1 year ago

I'd like to subset the data to projects in Africa, and am struggling to do so. Would you consider including a recipient_region attribute in the geojson so users can quickly subset the data based on that?

Here's why I'm asking: If I use the dataset in the .xlsx format, there is a Recipient Region field I can use to easily limit observations to Africa. Then, I need to convert the geoJSON URL DL fields into a sf object, ideally with one observation for each geoJSON URL DL. Doing this in R works well for only one observation (using either geojson_sf(geojson_url_dl) or read_sf(geojson_url_dl)), but I am struggling to do this for more than one observation. The closest I've gotten is this: two_proj_sf <- lapply(as.list(two_proj$geojson_url_dl),function(x) geojson_sf(x)) which creates a list. When I attempt to convert the list into a dataframe using this code two_proj_sf <- bind_rows(two_proj_sf, .id="column_label") I receive an error because the implementation start year was interpreted as a character in the first instance and as a double in the second instance.

Error in bind_rows(): ! Can't combine 1$Implementation Start Year and 2$Implementation Start Year .

Since I was struggling to do this using the data you provided in the .xlsx format, I switched to reading your complete geojson file and filtering on the attributes there instead. However, those attributes do not include a Recipient Region, so I need to figure out how to get a list of all the African countries to limit based on Recipient, or to get a geographic representation of the African continent to subset on that.

I looked through your examples, but didn't notice any R code examples there.

I imported the geoBoundariesCGAZ_ADM0.topojson file also available from AidData to get country boundaries, but that also does not have regions, so I would need to find a definitive list of African country ISO codes to be able to subset using those boundaries.

It occurs to me that I could use the .xlsx data to create a list of project ids I want to include and then subset the geojson data by those project ids. I'll work on that, but in the meantime also wanted to see if you have other suggestions or ideas, or could consider including the recipient region in the geojson file.

sgoodm commented 1 year ago

@cc50liu, I'd suggest one of two approaches for achieving your goal with the current data format:

1 - Join the raw data

2 - Dynamically load individual GeoJSONs and manage fields

I'll leave this issue open so we can incorporate region into the fields included with the GeoJSONs for the next release.

cc50liu commented 1 year ago

Thanks for your detailed answer @sgoodm, especially for the example code and ideas on how to move forward.