dunkelstern / osmgeocoder

OpenStreetMap / OpenAddresses.io geocoder written in python
BSD 3-Clause "New" or "Revised" License
16 stars 1 forks source link

Can't geocode incomplete addresses #7

Open rjurney opened 3 years ago

rjurney commented 3 years ago

Our use case is that we have a billion addresses varying from complete to United States, United States. It is common for addresses to be missing information and we need to get as specific as possible for comparing pairs of addresses to compute a distance between them.

We've loaded all of the OSM data and are testing out geocoding and we have a problem - some of our addresses are incomplete and have no house number or no street and house number. Some others are complete but have no nation. We want to geocode all of these down to as close as possible to the city level, but OSMGeocoder doesn't return any results for these three cases.

Can you help us to understand what changes we would have to make and where to handle these use cases?

1) In the absence of a house number, interpolate to the center of the street along its length. 2) In the absence of a house number and street, interpolate to the city center. 3) In the absence of a country, search for the best match among other address parts

We are willing to contribute these improvements if you can help us figure out how to build them.

Thanks!

dunkelstern commented 3 years ago

I have an idea on how to fix that, one could coalesce the geometry in the query picking the house geometry first, then street/road geometry and so on until one does return a result.

Can you give me some examples to test on (preferably with a smaller region then just "the USA")? I think I can fix that issue some time from now to the beginning of August.

If you want to try for yourself start by looking at the forward geocoding functions:

I think you could probably use the COALESCE function of postgres in these lines:

It currently uses only the house geometry table as you can see, it should be possible to reference to a centroid from the streets and cities tables (or even fall back to the country centroid if you have a country)