MattTriano / analytics_data_where_house

An analytics engineering sandbox focusing on real estates prices in Cook County, IL
https://docs.analytics-data-where-house.dev/
GNU Affero General Public License v3.0
9 stars 0 forks source link

Add functionality and data needed to calculate navigable routes between points. #60

Open MattTriano opened 1 year ago

MattTriano commented 1 year ago

PostGIS makes it very easy to calculate the distance between two points, but that's not very useful if I'm trying to evaluate the shortest safe walking distance between say, a house and a CTA train station.

The postgres extension pgRouting is a rich and mature tool for applying routing and network analysis algorithms to graph data, and it would be quite easy to add that extension. The hard part in implementing this functionality would be in preparing graph data covering all public roadways in Chicago or Cook County that (ideally) also indicates where there are sidewalks. There are two data sources I know of (both below) from Cook County and Chicago data portals respectively, and I could probably also get this data from OpenStreetMaps.

Street Shapefile Sources

MattTriano commented 1 year ago

To add the pgRouting extension to the PostGIS (dwh_db) container, I'll have to:

The initdb script only runs if the database doesn't exist, but my local database exists and has months of data in it. I could just 1) get a shell in the PostGIS dwh_db container, 2) use psql to install the extension into my local setup, then 3) go through the above steps and hope that prod behaves the same as dev,

but I don't want to go that route as I want to minimize the differences between the system I develop on and prod. So I should implement a more automated way to backup databases [than 1) adding a mount point to the docker-compose.yml, 2) getting a shell in the PostGIS dwh_db container, 3) running pg_dumpall -c -U $POSTGRES_USER -d $POSTGRES_DB > db_backup_$(date +"%Y_%m_%d__%H%M"), and 4) moving that backup file to a location visible on the host machine through the mount point] and reload/COPY that data back into a db.