So if "kake" and "formel 1-" are in dix, we can
analyse "formel 1-kake" as a compound. One left-part has to have all
the spaces (so "kakeformel 1" isn't supported, nor is
"formel 1-formel 1").
Only takes effect when run with -e option.
This closes #138
It makes analysis slightly slower when compounding is in effect (and your FST has multiwords with compounding), but lt-proc is far from being the bottleneck. For nob-dan, it's negligible: 13.8s vs 13.6s on 50k lines. For nob-nno, 19.9s vs 17.9s on 50k lines.
For Norwegian, these types of compounds tend to always have a dash, but it was simpler to implement without that requirement, so .dix writers get to (have to) decide if they want space words to be able to cp-L without dash.
Caveat: If the dix already has space words which allow cp-L but shouldn't, they'll now start compounding. This should be fixed in dix anyway, but it'll alter translations. I think I maintain most of the .dix that use compounding though :)
So if "kake" and "formel 1-" are in dix, we can
analyse "formel 1-kake" as a compound. One left-part has to have all
the spaces (so "kakeformel 1" isn't supported, nor is
"formel 1-formel 1").
Only takes effect when run with -e option.
This closes #138
It makes analysis slightly slower when compounding is in effect (and your FST has multiwords with compounding), but lt-proc is far from being the bottleneck. For nob-dan, it's negligible: 13.8s vs 13.6s on 50k lines. For nob-nno, 19.9s vs 17.9s on 50k lines.
For Norwegian, these types of compounds tend to always have a dash, but it was simpler to implement without that requirement, so .dix writers get to (have to) decide if they want space words to be able to cp-L without dash.
Caveat: If the dix already has space words which allow cp-L but shouldn't, they'll now start compounding. This should be fixed in dix anyway, but it'll alter translations. I think I maintain most of the .dix that use compounding though :)