Closed fgomez828 closed 2 years ago
Forest to review.
here's the link to the repo where I'm working on this project: https://github.com/fatima3558/walk-to-mcd
Some examples of SQL ETL pipelines:
Cc @fgregg. Any other good contemporary models for a pipeline that does much of the work in SQL?
@fatima3558 got additional resources from @fgregg, will review those resources and update this thread with a concrete next step.
@fgregg here are the steps I think are next based on your suggestions from the last time we talked about this project:
Am I missing anything?
I would suggest not doing this with GeoDjango, at least to start with. It's another level of abstraction and hard to follow.
I would recommend your first step is to try to get your GeoJSON file directly into a postgis database with ogr2ogr. (example https://github.com/datamade/thrive-zones/blob/839438cb4bf4d3f3dbf6e7ae85204a12c5f921ed/data/Makefile#L13)
Some good steps would be
@fgregg is going to add some examples of geographic queries to this issue.
@fgregg made headway on the query necessary to calculate the percentage of Chicago within 2 miles of a Starbucks. I was going to start adding the total of the intersections between the Chicago area with the 2mile buffer around each Starbucks location, but I noticed every area value is identical. Here are the query and a few records:
select starbucks."addr:full", starbucks."addr:city", st_area(st_intersection(st_buffer(starbucks.geom, 2 * 1609.344),chicago.geom)) from starbucks inner join chicago on st_intersects(starbucks.geom, chicago.geom);
addr:full | addr:city | st_area
--------------------------------+-----------+---------------------
6332 Northwest Highway | Chicago | 0.06495417436755024
8500 W Bryn Mawr Ave | Chicago | 0.06495417436755024
Circular Building #6 | Chicago | 0.06495417436755024
Circular Building #6 | Chicago | 0.06495417436755024
Circular Building #6 | Chicago | 0.06495417436755024
10000 West O'Hare Ave | Chicago | 0.06495417436755024
Circular Building #6 | Chicago | 0.06495417436755024
3506 S. State St. | Chicago | 0.06495417436755024
3857 S. Dr Martin Luther King | Chicago | 0.06495417436755024
3108 South Halsted | Chicago | 0.06495417436755024
Beyond the fact that there are duplicate starbucks locations here, as best I can tell when I scroll through the whole list there are apparently no starbucks near the edges of Chicago's boundary, which is possible but seems suspicious. Any thoughts on how I can double-check this? I did a quick check against google maps, but I'm not sure if there's a more reliable way to verify this.
I'm also a bit confused about units here -- when I run a practice query I did to build up to the above more complex one:
select starbucks."addr:full", starbucks."addr:city", st_area(st_buffer(starbucks.geom, 2 * 1609.344)) from starbucks limit 1;
addr:full | addr:city | st_area
--------------------------+-----------+--------------------
4-831 Kuhio Highway #208 | Kapaa | 32338023.325657208
The units are in square meters, and when I convert this to square miles it is about 12.5 square miles (google conversion here) which sounds right to me for a circle with radius 2 miles.
I'm not sure what's happening at the level of units with the more complex query.
@fgregg here's the final repo with a complete Makefile & recipes to download files, create & prep the database, and run the central query: https://github.com/fatima3558/walk-to-mcd/pull/7
NOTE: I didn't get a chance to put this together with Docker in mind, so you can run the commands without docker-compose
This is an issue for me to track my progress on a small project using Make and Postgis. The question I will be trying to answer is: how much of a given city is within two miles of a McDonald’s?