CodeForPittsburgh / food-access-map-data

Data for the food access map
MIT License
8 stars 18 forks source link

Latitude/Longitude Sanity checking #119

Closed wkbraid closed 3 years ago

wkbraid commented 3 years ago

Looking at the merged_dataset.csv in qgis you can see some notable outliers.

image [The sea off the coast of Ghana is (0,0) which is pretty unsurprising, I'm less sure what's going on with mainland China]

In particular I stumbled across "Green Grocer/East Hills Community Center" at "2291 Wilner Drive" which lists latitude/longitude as (0,0).

image [I would guess that these are correct data points, just not in Allegheny County]

This probably doesn't affect the end functionality, since these points will just not be displayed on the map. But it is probably indicative that something is going wrong.

maxachis commented 3 years ago

Could be the result either of erroneous Geocoding or of the data sources making an error.

Could do a couple of things here: One, for if the data sources are making an error, would be to check if the coordinates are outside the general coordinates of Allegheny country (e.g. Latitude greater than 82 or less than 78, for example) and then try to geocode the street address to see if it's corrected.

Alternatively, if it's the Geocoder itself causing the issues, we could add a way to make an alert that we are getting weird coordinates from the Geocoder -- how that would work, and where the alert would be seen, is up for debate.

Alternatively alternatively, the one in mainland China actually makes perfect sense, because one of our food sources, in an attempt to get a cheap supply of fresh food, successfully dug a hole to China and has set up their receiving warehouse in the Guangdong province.

maxachis commented 3 years ago

Resolved: Update Geocoder to check if Lat/Long = 0 in addition to checking if Lat/Long is missing.

Also add boundary checkers. <75 and >85 for Latitude, and figure out the part for Longitude later I believe in you Max

maxachis commented 3 years ago

I've started working on a new variant of the Geocoder. Updates will be as follows:

Geocode not only rows where lat/long values are null or 0, but also where lat values are outside range of [-79,-81] and long values are outside range of [40,41]. Those ranges encompass both the whole of Allegheny county as well as a good chunk of the area outside.

Additionally (and this one we may want to debate), add a section that, if after geocoding, lat/long values are STILL outside of the above boundaries, exclude them from the final geocoded dataset, because they're not relevant for our work. Maybe we want to separately list which rows these are so we can maybe address them later, or maybe we just want to ignore them. The choice is ours.

maxachis commented 3 years ago

First update wherein I geocode lat/long values outside range of Allegheny added, along with unit test.

See Pull Request #124