giggls / osml10n

Localization functions for Openstreetmap
https://tile.openstreetmap.de
GNU General Public License v3.0
14 stars 7 forks source link

Inconsistent changes in street name rewriting #8

Closed joto closed 2 years ago

joto commented 2 years ago

I am comparing results from different runs and am getting inconsistent results, i.e. sometimes one, sometimes a different result is returned.

- "<U+202A>пл.дь Ленина|ploschtschad Lenina<U+202C>"
+ "<U+202A>пл. Ленина|ploschtschad Lenina<U+202C>"

The reason are these lines in the RU/UK rewrites: https://github.com/giggls/osml10n/blob/master/lua_osml10/osml10n/street_abbrev.lua#L157 https://github.com/giggls/osml10n/blob/master/lua_osml10/osml10n/street_abbrev.lua#L171

It depends on the (undefined) order of table iteration, which one is used: https://github.com/giggls/osml10n/blob/master/lua_osml10/osml10n/street_abbrev.lua#L209 https://github.com/giggls/osml10n/blob/master/lua_osml10/osml10n/street_abbrev.lua#L217 https://github.com/giggls/osml10n/blob/master/lua_osml10/osml10n/street_abbrev.lua#L225

There are other cases in the code when a for loop is used to iterate over tags or languages which might also lead to similar problems. I have another case where these tags:

"name:en"            = "Lantau Trail Section 3"
"name:yue"           = "<U+9CF3><U+51F0><U+5F91>"
"name:zh"            = "<U+9CF3><U+51F0><U+5F91><U+7B2C>3<U+6BB5>"

do result in one of these:

- "<U+202A><U+9CF3><U+51F0><U+5F91><U+7B2C>3<U+6BB5>|Lantau Trail Section 3<U+202C>"
+ "<U+202A><U+9CF3><U+51F0><U+5F91>|Lantau Trail Section 3<U+202C>"

I am not exactly sure what's happening here, but it looks like sometimes the yue und sometimes the zh language is chosen for that street name label.

giggls commented 2 years ago

Frankly I have no idea how to do this the right way. Can you build a test which will cause such problems in a reproducable way?

joto commented 2 years ago

Can you build a test which will cause such problems in a reproducable way?

No, that's the problem. That's inherently not reproducable, because the iteration order for a table is random.

The way to fix this is to force order in some way. For instance by iterating over the keys always in alphabetical order. But that is, essentially arbitrary, too. For instance, which language is chosen will depend on ASCII order, not on something actually in the data.

There are some pointers here: http://lua-users.org/wiki/OrderedTable how to approach this.

giggls commented 2 years ago

argh lua will not become my friend any time soon.

Is there a no way to define these tables in a somewhat static way?

Frankly I tend to replace those for-loops by unrolled function calls.

giggls commented 2 years ago

I pushed an unrolled version with fixed call order. The reason why this bug exists after all is that I wanted to make the code less redundant which obviously did not work.

joto commented 2 years ago

I believe this fixes some of the problems but not all. Here is the diff between two runs of a largish test file after you above change:

street-abbrev-diff.txt

giggls commented 2 years ago

argh found an evil bug there. I need to ignore name:left* and name:right* altogether.

Which functions did you call?

E.g. for https://www.openstreetmap.org/way/967034837 a transcription should never happen.

joto commented 2 years ago

This is basically the function I am calling for every object:

function process(objtype, object)
    local label = osml10n.get_localized_name_from_tags(objtype .. object.id, object.tags, 'de', object.bbox)
    if label and #label > 0 then
        object.tags['osml10n_label'] = label
    end

    if object.tags.name then
        local trans = osml10n.geo_transcript(objtype .. object.id, object.tags.name, object.bbox)
        if trans and #trans > 0 then
            object.tags['osml10n_name'] = trans
        end
    end

    if object.tags.highway then
        local trans = osml10n.get_streetname_from_tags(objtype .. object.id, object.tags, true, false, '|', 'de', object.bbox)
        if trans and #trans > 0 then
            object.tags['osml10n_streetname'] = trans
        end
    end

    return object.tags
end
giggls commented 2 years ago

Hm it would probably be a good idea to build a command line lua tool which will download a given way, node or relation from osm.org and perform those 3 functions on it.

giggls commented 2 years ago

BTW, the 3 function i would export for public use are:

giggls commented 2 years ago

At least this is what I use in the old code. For lua it will probably be better to use another wrapper for get_names_from_tags as this will output an array of two values.

My original intension was to put this array into a PostgreSQL column. This way it would be possible to use a custom separator for the two names in the renderer.

giggls commented 2 years ago

Do you consider this one as fixed for now?

joto commented 2 years ago

I have not seen any problems any more with this, so lets call it fixed. Closing.