codeforboston / pantry_pickup

Combining city data with a list of specific needs from food pantries will allow citizens to most effectively make useful and needed donations assisted by the Pantry Pick-Up App.
http://www.pantrypickup.org
21 stars 35 forks source link

Verify geocoded addresses #4

Open eucalyptustree opened 11 years ago

eucalyptustree commented 11 years ago

Verify that geocoded addresses (in /files/geocode-inprogress.xlsx) and fix any errors.

carpeliam commented 11 years ago

Ah yeah I just noticed that markers for pantries such as "Sacred Heart Tree of Life Pantry", "Salvation Army Attleboro Food Pantry", "St. Paul R.C. Church Dollar-A-Bag", and "Soup Kitchen In Provincetown, Inc." are showing up in the ocean, but others look good.

WheresHJ commented 11 years ago

Not convinced that geocoding is correct - for example, clicking on the following shows me a map location in Methuen.

ABC - People's Baptist Church Food Pantry 134 Camden St. Roxbury, MA 2118

screen shot 2013-06-04 at 9 48 46 am

carpeliam commented 11 years ago

Yep, some locations are definitely questionable.

pantry pickup

Though one of the Provincetown locations, and a Nantucket location, seem legitimate. I'm thinking maybe we want to try another source, instead of testing the results individually?

Here's an idea for getting the geolocation results from the Google Maps API, as I think that was accurate... remove all the location data, and rig the search to only return results that don't have a lat/lng. Use Google Maps API for geolocation as we were doing before; it'll give some results, and then fail on a bunch of others due to rate limiting. Copy the results it gives back into the db, and repeat until everything has a value. Then reset the search and remove Google Maps geolocation. Does that sound good?

JBaldachino commented 11 years ago

Well, presumably new locations that are added should not require manual geocoding.

1.) Clear all existing geocoding

2.) Some route on the application will trigger a query that returns pantries without a loc. Something like {'loc':{$exists:false}}. We can limit the query to the google maps rate limit to prevent it from making a gazillion useless requests.

3.) For each in the result, it sends a geo-code request to the Google Maps API and stores the result.

On a "going forward" basis the application would only have to ping the API with new pantries. "Catching up" can just happen over time, with the application geocoding existing locations until the API rate limits, rinse and repeat until done.

Once we have an actual "add new pantry" workflow, we would just attach the geo-code request to the act of submission. If we ever get so many pantry's being added per hour by users that we hit the rate limit..., well, we'll have other problems.

eucalyptustree commented 11 years ago

The current geocoding should not be relied on, sorry about that. I'm working on cleaning it up now. If others are interested in working on data issues, get in touch with me and I'll point you to what I've done so far.

see also /files/geocoding-inprogress.xlsx and /files/geocoding-readme.txt

eucalyptustree commented 11 years ago

Should explain more, sorry. The current geocoding was done sloppily and without too-close checking. The batch geocoder I used spat out more results than we have data for (450 geocoded addresses, vs 325 rows), so I did a vlookup to match real rows with the geocoded output. I'm going back through now to fix them up, but am first doing a visual inspection of the data we have (and finding duplicates, cleaning up spelling errors, sorting out PO Box issues etc).

carpeliam commented 11 years ago

Our server-side search component now involves geocoding based on the search string. We could certainly write a script to go through all of the records and do a location lookup again.

As @JBaldachino mentioned, we're going to have to bake this into our "add a pantry to the app" workflow. It'd be super-simple to have a fail routine that just calls the lookup function again after a few seconds.

ohnorobo commented 11 years ago

Redid most of the geocoding using google's geocoder. We no longer have pantries in the ocean or South America (there's still a PO box that thinks it's in Spain though.) New file is pantries.geocoded.csv. If you look at files/log you should see which of them failed to re-geocode and why.