Closed adam-lynch closed 2 years ago
Here area my notes so far. I'm summarizing this from memory and exact details can be looked up in the OpenStreetMap (OSM) wiki.
OSM has a tag name
on a street, which I use already. So it's easy to show the preferred language. If that name is in English for example, there should be a name:en
tag.
If the street has names in other languages, it will have a tag like tag:fr
. The short code could be using one of two patterns / conventions. OSM contributors are recommended not to add alternative name tags unless they really exists and not to directly translate for the sake of it.
I don't think there's a tag on a street which tells you which language name
is using. If I needed to I could compare name
and name:*
. I'm guessing that would be reliable but could fail rarely.
I don't think I'll use them but I found two interesting tags;
loc_name
represents a local (sometimes slang) name for something. E.g. Daly's bridge in Cork is better known as "the shaky bridge".official_name
tag and use the local name in as the name
.There are other less-interesting name related tags too.
I don't think I can blindly just take the first name which isn't the main name and show it. Side note: there could be more than two names. There could also be a second name I don't want to show, maybe? I don't know enough about this but maybe there's a variant which is very similar to the main name, e.g. simplified vs traditional Chinese.
I don't mind hardcoding an ordered list of preferred languages per country if I have to (and I don't mind releasing this for small subset of countries to begin with). How can I figure out which country it is?
So it seems this needs to computed per-street.
I can't easily figure it out based on street coordinates on their own and streets don't seem to have a tag indicating which country they're in.
I could pull in the country elements which overlap/contain the area circle from OSM (as well as the streets).
After a quick nominatim search, I see there's an element which gives the polygon of Ireland plus some information; https://nominatim.openstreetmap.org/ui/details.html?osmtype=R&osmid=62273&class=boundary. It has a default_language=en
tag. It's hard to see at the bottom but it has place:country_code=ie
and place:country=Ireland
tags. At least I think it does, I'm not sure why it's greyed out.
I'm not sure how these relations work but there's a "linked place"; a node (not a polygon) representing Ireland: https://www.openstreetmap.org/node/1420871007. This seems to have a lot of tags the other one didn't have but nothing too useful it seems.
What would be ideal is a tag which tells me which language is the second-most preferred language (or even better an ordered list). Again though, this can vary within a country too though.
Oh, I should say... a good way to decide on this stuff is to look at what street signs look like these areas. Related: #28
I've started this on the alternative_names
branch. What's left is to figure out the country code for each street
It doesn't seem possible to either:
way
. It's just not in the data (most of the time).way
I'd check which polygon it's in to figure out the country code. I haven't found a query that can get me this. Even just getting all of the countries on their own is very slow. Plus the polygons would've had too many points anyway.I also had a misunderstanding; name:{{main_language}}
isn't guaranteed to exist. It only exists when there are multiple names.
https://wiki.openstreetmap.org/wiki/Multilingual_names is helpful, it shows that things are even more complicated in some cases.
Two new ideas:
name:*
) without knowing the country or that it's an official / preferred language. E.g. if there are multiple names, use the name
plus the name:*
related to the most popular language (so loop over the language codes from most popular to least and take the first related tag). This could have bad cases but it might be acceptable.It would be unacceptable if it were city names we were talking about because there can be a lot of translations in OSM for those. But we're only talking about ways
so it's fine.
There could be potential for making this a little smarter on top by altering the logic slightly if we can detect the country by the name:*
found. E.g. maybe we can assume it's Ireland if we find name:ga
.
I'll need to review https://wiki.openstreetmap.org/wiki/Multilingual_names again and decide.
Another idea (which isn't mutually exclusive) is to factor in the user's browser language setting.
https://nominatim.org/release-docs/develop/api/Reverse/ or https://nominatim.org/release-docs/develop/api/Lookup/ might've been helpful but I don't want to do too many API requests (especially when rate-limited).
Edit: https://operations.osmfoundation.org/policies/nominatim/
I think I have to give up on the idea of using the most popular language as the alternative name due to cases like this:
name
is "St Paul's Avenue". name:en
is "Saint Paul's Avenue". (I've fixed this particular case in OSM)
This reminded me: https://news.ycombinator.com/item?id=30739571. Related: #28