datamade / how-to

đź“š Doing all sorts of things, the DataMade way
MIT License
88 stars 12 forks source link

Make & Postgis Project (Fatima) #201

Closed fgomez828 closed 2 years ago

fgomez828 commented 3 years ago

This is an issue for me to track my progress on a small project using Make and Postgis. The question I will be trying to answer is: how much of a given city is within two miles of a McDonald’s?

hancush commented 3 years ago

Forest to review.

fgomez828 commented 3 years ago

here's the link to the repo where I'm working on this project: https://github.com/fatima3558/walk-to-mcd

hancush commented 3 years ago

Some examples of SQL ETL pipelines:

Cc @fgregg. Any other good contemporary models for a pipeline that does much of the work in SQL?

hancush commented 3 years ago

@fatima3558 got additional resources from @fgregg, will review those resources and update this thread with a concrete next step.

fgomez828 commented 3 years ago

@fgregg here are the steps I think are next based on your suggestions from the last time we talked about this project:

Am I missing anything?

fgregg commented 3 years ago

I would suggest not doing this with GeoDjango, at least to start with. It's another level of abstraction and hard to follow.

I would recommend your first step is to try to get your GeoJSON file directly into a postgis database with ogr2ogr. (example https://github.com/datamade/thrive-zones/blob/839438cb4bf4d3f3dbf6e7ae85204a12c5f921ed/data/Makefile#L13)

Some good steps would be

hancush commented 3 years ago

@fgregg is going to add some examples of geographic queries to this issue.

fgomez828 commented 3 years ago

@fgregg made headway on the query necessary to calculate the percentage of Chicago within 2 miles of a Starbucks. I was going to start adding the total of the intersections between the Chicago area with the 2mile buffer around each Starbucks location, but I noticed every area value is identical. Here are the query and a few records:

select starbucks."addr:full", starbucks."addr:city", st_area(st_intersection(st_buffer(starbucks.geom, 2 * 1609.344),chicago.geom)) from starbucks inner join chicago on st_intersects(starbucks.geom, chicago.geom);

           addr:full            | addr:city |       st_area
--------------------------------+-----------+---------------------
 6332 Northwest Highway         | Chicago   | 0.06495417436755024
 8500 W Bryn Mawr Ave           | Chicago   | 0.06495417436755024
 Circular Building #6           | Chicago   | 0.06495417436755024
 Circular Building #6           | Chicago   | 0.06495417436755024
 Circular Building #6           | Chicago   | 0.06495417436755024
 10000 West O'Hare Ave          | Chicago   | 0.06495417436755024
 Circular Building #6           | Chicago   | 0.06495417436755024
 3506 S. State St.              | Chicago   | 0.06495417436755024
 3857 S. Dr Martin Luther King  | Chicago   | 0.06495417436755024
 3108 South Halsted             | Chicago   | 0.06495417436755024

Beyond the fact that there are duplicate starbucks locations here, as best I can tell when I scroll through the whole list there are apparently no starbucks near the edges of Chicago's boundary, which is possible but seems suspicious. Any thoughts on how I can double-check this? I did a quick check against google maps, but I'm not sure if there's a more reliable way to verify this.

I'm also a bit confused about units here -- when I run a practice query I did to build up to the above more complex one:

select starbucks."addr:full", starbucks."addr:city", st_area(st_buffer(starbucks.geom, 2 * 1609.344)) from starbucks limit 1;

        addr:full         | addr:city |      st_area
--------------------------+-----------+--------------------
 4-831 Kuhio Highway #208 | Kapaa     | 32338023.325657208

The units are in square meters, and when I convert this to square miles it is about 12.5 square miles (google conversion here) which sounds right to me for a circle with radius 2 miles.

I'm not sure what's happening at the level of units with the more complex query.

fgomez828 commented 2 years ago

@fgregg here's the final repo with a complete Makefile & recipes to download files, create & prep the database, and run the central query: https://github.com/fatima3558/walk-to-mcd/pull/7

NOTE: I didn't get a chance to put this together with Docker in mind, so you can run the commands without docker-compose

fgomez828 commented 2 years ago

Got map previews! Map of Chicago within 2 miles of a Starbucks

Map of Chicago NOT within 2 miles of a Starbucks