alephdata / aleph

Search and browse documents and data; find the people and companies you look for.
http://docs.aleph.occrp.org
MIT License
2.03k stars 272 forks source link

Use libpostal to normalise extracted addresses #505

Closed pudo closed 4 years ago

pudo commented 6 years ago

libpostal (and it's Python binding, pypostal) allow for the parsing and normalisation of address strings using a model trained against OpenStreetMap data. This would be useful in order to increase the likelihood that addresses are going to match based on string value (and those provide connectivity between disparate source records).

Open points on this:

felixebert commented 5 years ago

I think an extra, optional docker service using libpostal for address parsing and a service like photon (komoot) for (offline) geocoding would be nice

pudo commented 5 years ago

Thanks for the pointers to offline geocoders, that would indeed be an amazing piece of infra to get going.

Note to self: Working on alephdata/opensanctions right now, and it's obvious we need an Address object in followthemoney. Every sanctions list has them split into components, and we're just string-joining it into a mess.

pudo commented 4 years ago

No actual momentum for doing this, closing.

rjurney commented 3 years ago

I have a great use case for this... and can do it in open source.

Rosencrantz commented 3 years ago

@rjurney We'd be really interested to accept any changes and work with you to help implementing this change, if you think it's something you'd be interested in implementing.