apertium / lttoolbox

Finite state compiler, processor and helper tools used by apertium
http://wiki.apertium.org/wiki/Lttoolbox
GNU General Public License v2.0
18 stars 22 forks source link

have lt-comp split multichar symbols that lt-proc won't parse #111

Closed mr-martian closed 2 years ago

mr-martian commented 3 years ago

This has particularly been a problem due to how lexd handles combining diacritics, but in any case where an att file has a multichar symbol that isn't <[^<>]*>, lt-comp will add it to the alphabet but lt-proc will ignore it. I think the simplest solution is to have lt-comp split multichar symbols into multiple transitions (possibly triggered by a command-line option rather than always).

mr-martian commented 2 years ago

See also https://github.com/apertium/apertium-yid/issues/3