cityofaustin / atd-micromobility-api

A dockless mobility data API built with Python/Sanic
12 stars 4 forks source link

Back back end #28

Closed johnclary closed 5 years ago

johnclary commented 5 years ago

Starting to refactor the API to test feasability of querying Socrata (City of Austin Open Data Portal) for trip records.

Under this model, we replace the part of the app logic which aggregates trip counts from in-memory json data with a routine that fetches grouped trip counts from Socrata. Socrata hosts the individual trip records and supports SQL-like querying, so this opens up the possibiltiy of querying by datetime (or any other trip property). The app will translate the Socrata query results into a geojson grid + trip count response, returning a payload in the same format it returns today. In other words, this can be done without a breaking change to the API.

The primary technical concern is that the app will now be making an HTTP request behind the scenes that will definitely add latency—hopefuly just a second or two. We'll see.

johnclary commented 5 years ago

OK. The refactored trip compiler uses Socrata SoQL Queries to fetch aggregated trip counts from the Dockless Vehicle Trips dataset on the City's Open Data portal.

@mateoclarke @sergiogcx give this a test run, but I'm finding performance to be decent. It's definitely noticeably slower than the previous implementation, but very large queries are returning in only a few seconds.

Worth noting that with this change grid.json no longer contains trip counts. It merely contains grid cell geometries. (This is technically not a gejoson file. The top-level of the JSON is an associative array that contains geojson features indexed by their id property. This makes looking them up much faster.) The newgrid.json also has smaller hex cells— 500 ft / edge—and covers the entire city of Austin full-purpose jurisdiction, bringing it's filesize up to ~8mb.

johnclary commented 5 years ago

Fixes #11

mateoclarke commented 5 years ago

Was able to pull this local and though I noticed a lag, I think it is performant enough for us to get away without any complaints. Also love that this opens the door to time based queries which would be great for analysis.