Closed jeutzsch closed 2 days ago
@flammie could you have a look? Also @Trondtr?
sorry I forgot to push this earlier, the Unicode Normalisation Form filter is generated automatically nowadays, we just applied the relax filters to both desc automata but I think it makes sense not to have it on generator so I've removed it in giella-core.
When generating a word form using
$HLOOKUP $GTHOME/langs/lang-sje/src/fst/generator-gt-desc.hfstol
, any character with a diacritic triggers two outputs: one with the single, precomposed unicode character, and one with a double, decomposed unicode (base + combining diacritic). For example:gähppe+A+Sg+Nom
gähppe+A+Sg+Nom gähppe
(precomposed output)gähppe+A+Sg+Nom gähppe
(base+combing output)This, If there are two characters with diacritics, then there are four outputs!:
härrá+N+Sg+Nom
härrá+N+Sg+Nom härrá
härrá+N+Sg+Nom härrá
härrá+N+Sg+Nom härrá
härrá+N+Sg+Nom härrá
I've tried modifying
spellrelax.regex
but that didn't change anything.