adam-lynch / back-of-your-hand

How well do you know your area? Test your knowledge by locating streets with this game.
https://backofyourhand.com
Mozilla Public License 2.0
80 stars 9 forks source link

Show multiple languages for more countries than Ireland #27

Closed adam-lynch closed 2 years ago

adam-lynch commented 2 years ago

This reminded me: https://news.ycombinator.com/item?id=30739571. Related: #28

adam-lynch commented 2 years ago

Here area my notes so far. I'm summarizing this from memory and exact details can be looked up in the OpenStreetMap (OSM) wiki.

OSM has a tag name on a street, which I use already. So it's easy to show the preferred language. If that name is in English for example, there should be a name:en tag.

If the street has names in other languages, it will have a tag like tag:fr. The short code could be using one of two patterns / conventions. OSM contributors are recommended not to add alternative name tags unless they really exists and not to directly translate for the sake of it.

I don't think there's a tag on a street which tells you which language name is using. If I needed to I could compare name and name:*. I'm guessing that would be reliable but could fail rarely.

I don't think I'll use them but I found two interesting tags;

There are other less-interesting name related tags too.


I don't think I can blindly just take the first name which isn't the main name and show it. Side note: there could be more than two names. There could also be a second name I don't want to show, maybe? I don't know enough about this but maybe there's a variant which is very similar to the main name, e.g. simplified vs traditional Chinese.

I don't mind hardcoding an ordered list of preferred languages per country if I have to (and I don't mind releasing this for small subset of countries to begin with). How can I figure out which country it is?

So it seems this needs to computed per-street.

I can't easily figure it out based on street coordinates on their own and streets don't seem to have a tag indicating which country they're in.

I could pull in the country elements which overlap/contain the area circle from OSM (as well as the streets).

After a quick nominatim search, I see there's an element which gives the polygon of Ireland plus some information; https://nominatim.openstreetmap.org/ui/details.html?osmtype=R&osmid=62273&class=boundary. It has a default_language=en tag. It's hard to see at the bottom but it has place:country_code=ie and place:country=Ireland tags. At least I think it does, I'm not sure why it's greyed out.

I'm not sure how these relations work but there's a "linked place"; a node (not a polygon) representing Ireland: https://www.openstreetmap.org/node/1420871007. This seems to have a lot of tags the other one didn't have but nothing too useful it seems.

What would be ideal is a tag which tells me which language is the second-most preferred language (or even better an ordered list). Again though, this can vary within a country too though.

adam-lynch commented 2 years ago

Oh, I should say... a good way to decide on this stuff is to look at what street signs look like these areas. Related: #28

adam-lynch commented 2 years ago

I've started this on the alternative_names branch. What's left is to figure out the country code for each street

adam-lynch commented 2 years ago

It doesn't seem possible to either:

  1. Get the country code with each way. It's just not in the data (most of the time).
  2. Alternatively, get every country (including polygon) which intersects the area. Later, for each way I'd check which polygon it's in to figure out the country code. I haven't found a query that can get me this. Even just getting all of the countries on their own is very slow. Plus the polygons would've had too many points anyway.

I also had a misunderstanding; name:{{main_language}} isn't guaranteed to exist. It only exists when there are multiple names.

https://wiki.openstreetmap.org/wiki/Multilingual_names is helpful, it shows that things are even more complicated in some cases.

Two new ideas:

  1. Keep a hardcoded list of country codes to simplified polygons / rectangles. I don't like this because it won't be accurate and it could be slow (i.e. to query a coordinate against N polygons).
  2. Use an alternative name (name:*) without knowing the country or that it's an official / preferred language. E.g. if there are multiple names, use the name plus the name:* related to the most popular language (so loop over the language codes from most popular to least and take the first related tag). This could have bad cases but it might be acceptable.

It would be unacceptable if it were city names we were talking about because there can be a lot of translations in OSM for those. But we're only talking about ways so it's fine.

There could be potential for making this a little smarter on top by altering the logic slightly if we can detect the country by the name:* found. E.g. maybe we can assume it's Ireland if we find name:ga.

I'll need to review https://wiki.openstreetmap.org/wiki/Multilingual_names again and decide.

adam-lynch commented 2 years ago

Another idea (which isn't mutually exclusive) is to factor in the user's browser language setting.

adam-lynch commented 2 years ago

https://nominatim.org/release-docs/develop/api/Reverse/ or https://nominatim.org/release-docs/develop/api/Lookup/ might've been helpful but I don't want to do too many API requests (especially when rate-limited).

Edit: https://operations.osmfoundation.org/policies/nominatim/

adam-lynch commented 2 years ago

I think I have to give up on the idea of using the most popular language as the alternative name due to cases like this:

name is "St Paul's Avenue". name:en is "Saint Paul's Avenue". (I've fixed this particular case in OSM)