aiddata / gcdf-geospatial-data

Repository for AidData's Geospatial Global Chinese Development Finance Dataset (GeoGCDF)
https://aiddata.org/china
Other
29 stars 8 forks source link

Projects present in v1.1.1 not in v2.0 #29

Open cc50liu opened 1 year ago

cc50liu commented 1 year ago

Using the geoJson download file, I notice that some projects that were present in the v1.1.1 dataset are no longer present. Is that intentional? If so, can you point me to where I can read about why they were removed? Here are two projects that are no longer available in v2.0: 30276 Donation of goods to victims of cyclone 762 Donation of anti-malaria drugs and establishment of a malaria lab

Similarly, some projects that had a lat/long in v1.1.1 are present in v2.0, but without geoJson data, so they are dropping off a map I am using to compare to previous research.
19694 Medicine/Med equipment in 4 clinics 19700 Medical equipment

sgoodm commented 1 year ago

@cc50liu

Yes, that is intentional. Unfortunately these two versions of the data are generally not compatible for direct comparison. v1.1.1 was geocoded using a far less precise approach and as a result was able to include additional locations. When projects without sufficient location information to geocode precisely were encountered in v2.0 they were not geocoded.

Specific projects may have also been dropped based on additional information acquired since the v1.1.1 release or changes in the methodology. Below is a quote from a colleague involved in preparing the project data.

In v2.0, we were aiming to provide the most precise features available on OSM for our data user. There could be many reasons that a project was available in v1.1.1 but not in v2.0. For example, if we were not able to locate a project at the precise location (for example, the actual geo-boundary of a feature) or the feature is not available on the OSM, then we will not be able to provide a feature for the project. We are also updating the development finance flow on a rolling basis, which could result in changes and even the removal of a project record. For example, 30276 is classified as NGO Aid based on the TUFF2.0 methodology:https://www.aiddata.org/publications/aiddata-tuff-methodology-version-2-0 and thus is not included in the published dataset.

My recommendation is that if a data user wants to study the development finance at the precise level, v2.0 GeoJSONs would be the place to go and they will also refer to the GCDF2.0 dataset and 2.0 methodology to construct their analysis. If they prefer to use v1.1.1, they should not use GCDF2.0 for financial data, since many project records are outdated and not corresponding to each other.

The upcoming release of v3.0 this fall will aim to expand coverage across a broader range of projects for the years 2018-2020, including those without precise location information. A similar expansion of geospatial coverage for v2.0 projects (2000-2017) is planned for later on.

A key difference between the inclusion of less precise features in v3.0 vs v1.1.1 is that imprecise features will still be defined by established boundaries. E.g., in v1.1.1 a project which we only know took place somewhere in district X would simply have a centroid recorded at the center of district X. In v2.0, the boundary of district X will be recorded.

The use of centroids often resulted in users either ignoring the precision of coordinates associated with projects (e.g., conducting a district level analysis using projects geocoded at the country level) or entirely dropping projects that were not sufficiently precise. By providing exact boundaries even in cases where the precise location of a project is not known, we hope to facilitate more accurate usage of the geospatial data associated with projects.