DaanMatch / meeting-agendas

MIT License
0 stars 0 forks source link

meeting-agendas/agendas/20220608Agenda #15

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

DaanMatch - Check-in — Meeting Agendas

https://daanmatch.github.io/meeting-agendas/agendas/20220608Agenda.html

shpatrickguo commented 2 years ago

Geocoding Addresses

Reached out to Apoorv for this project. Complete

You can find the Geocode repo with the script on Geocode API. As I pointed out last year, using open source APIs will lead to a lot of failures. In fact, as of right now, out of 170,000 addresses I have yet to come across a single success. D: Nevertheless, it is not feasible for the organization to spend money on the Google API/paid alternatives.

I came up with a couple solutions:

  1. Since the open source APIs cannot deal with the addresses in such granularity. We will truncate the addresses to capture the locality. For example:
    
    # Failure will original address
    original_addr = "Room A-528, Near Dayal Market, Narela Road, Alipur, New Delhi, Delhi- 110036"

Success with truncated address

new_addr = "Alipur, New Delhi, Delhi- 110036" OUTPUT = ('28.7959955', '77.1360706')

2. Use a series of open source APIs, and create a parent function to try out all of them

def geocode(addr): """ Geocode address to latitude and longitude using multiple apis and return non-failure.

INPUT: string - address
OUTPUT: tuple - (latitude, longitude), or (Failed, Failed)
"""
openstreetmaps(addr)
geopy(addr)
API3(addr)

return Non-failure output, if all fail, return failure


This will allow us to explore:
- Which API performs the best
- Percentage of failures
- Any patterns in the addresses of the failures e.g. the address row is too short or has weird characters, no pincode etc.

### If this is complete
1. Will pass along additional lat long to Jarrett. 
2. Update database/datastore with lat long information 

### Caveats
Will not be able to use previous lat long values, because we are using different sources. So everything from the Google API cannot be used.