Open AbelVM opened 8 years ago
For testing purposes, this CSV file has a list of official names of municipalities in Alicante province in Spain:
Most of them are not recognized by CARTO
There is an enhancement for this problem https://github.com/CartoDB/cartodb/issues/9131
Oeeee! oeeee oeeee oeeee!
Oh I just saw this issue! @AbelVM we had the intention in the past to make namedplaces search fuzzy. Most of the other geocoding processes' strings are being normalized except for this one, which makes it pretty bad with complex names (accents, hyphens, spaces...). For other processes what we do is to store a normalized name in the DB and then run a regexp over the input, normalizing it in the same way. I think this could be a nice leapfrog ;-) From Geonames we have a ton of synonyms per each place, but if accents (or other character) don't match, it will just fail.
:+1: for a leapfrog testing different approaches:
I'm game
We may have different names for the same place even in the official language(s) of the country, having a strict match to geocode leads to many fails and "holes" in the results. V.g.:
As of today, loading a CSV with provinces of Spain may produce several holes always due to accents (accents in uppercase are not compulsory, so JAEN <> Jaén), optional articles, and the different co-official languages in different regions.
Maybe we should make use of fuzzy search like tsvector or trigrams
Tsvector sample pseudocode:
It would be much faster if we precompute a tsvector column in the geometries table.
More comments about this at: https://github.com/CartoDB/dataservices-api/issues/251
cc @ethervoid