aiddata / gcdf-geospatial-data

Repository for AidData's Geospatial Global Chinese Development Finance Dataset (GeoGCDF)
https://aiddata.org/china
Other
32 stars 8 forks source link

Use route APIs instead of web scraping directions links #1

Open sgoodm opened 3 years ago

sgoodm commented 3 years ago

Current approach:

We currently use a headless browser (implemented with Selenium and Firefox) to load the Javascript web map for OSM directions links. This allows us to extract the SVG path from the web map and utilize the start/end coordinates for the route to georeference the SVG path and create a GeoJSON.

Issue:

The basic implementation of Selenium does not appear to be threadsafe and can produce a range of errors when parallelized. Extracting all SVG path data in serial is by far the longest running portion of the build (takes several hours, whereas the remaining parallelized portion can take under an hour).

Possible solution:

The SVG data for routes produced by the directions links are generating using either OSRM or Grasshoper routing services, through their APIs. See an example OSM link for directions and corresponding API call for each service below

OSRM

Grasshopper

Implementation discussion:

  1. OSRM has the option to return a GeoJSON of the route directly, but Grasshopper will only return a polyline that must be decoded.

  2. We would need to explore further to determine if paid API keys are required or if the ones used for OSM are considered acceptable for public use. (Note: if they are not and I need to remove the keys from the above links please let me know). Since they are not hidden from publicly-viewable queries by OSM, I am assuming they are public use.

  3. Would the amount/rate of API calls be an issue (currently on order of a few hundred per build)?

  4. How fast are API results? Giving parallelization issue with SVG-based implementation, I am guessing this will be a net improvement even if the API is not very fast.