CitadelOnTheMove / converter-lib

CITADEL on the move - The Converter java library
0 stars 0 forks source link

Automatic geocoding #7

Open tthoeye opened 10 years ago

tthoeye commented 10 years ago

It would be very helpful if geocoding can be done by the Converter itself, instead of relying on third-party tools. We could use the geocoding API which automatically uses the WGS84 coordinate system

http://maps.googleapis.com/maps/api/geocode/output?parameters
jexxon53 commented 10 years ago

I tried using Google refine (can't figure out exactly what's happened to it in the meantime) with a file from Palermo, which as I understand it uses this API in a rather complicated process (you have to run the API, then eliminate a series of columns, etc.). The main problems I encountered there are two: 1. it took a very long time, since doing it through Refine at least it called the API for each row... and 2. It made me discover how incredibly inconsistent addresses are as entered in databases (at least in Palermo). If that's the state of the art then I think it would be useful to configure it as a separate tool, but incorporating it into the Converter would be frustrating; people with messy address listings needing a lot of clean-up would think it's the Converter that's not working.

tthoeye commented 10 years ago

My experience with the Google geocoding API is that can code even the most inconsistent addresses, in great contrast with the ones provided by governments :) Also, thanks to the semantic mapping step, people should be giving more thought to the way their addresses are formulated when using the convertor, as opposed to pasting a batch of addresses in some kind of online tool.

Not many people know how to geocode addresses or find the right tools themselves without clear instructions. Inserting the coordinates in the right way is defintely one of the harder parts of using the convertor right now so anything that can make this easier would be good

jexxon53 commented 10 years ago

I agree totally that it's a very important issue, and I've also seen that the Google API does pardon a lot of inconsistencies. You're right that in doing the semantic mapping people realise how important it is to structure data, and the addresses is where this first becomes evident. While a full-scale integration into the Converter is maybe a long-term task, we could start with a little tutorial on data consistency using addresses as the main example (even street names, abbreviations, street numbers, etc. are all very contentious issues... ). Or maybe a thematic workshop using the API or some tool together with the Converter?

Facyla commented 10 years ago

There is also the Nominatim service from OSM - but OSM people are not fully happy with it, and are working on another geocoder that would accept more mistakes and malformed addresses.

I agree it's useful, but maybe it should be separated from the converter itself, as it is really a different process to clean, enrich and geocode data ? Also reverse geocoding can be useful too as the template require some address fields.

Whatever the solution, documenting and adding various tools and possibilities in a documentation will always be useful !