Closed unhammer closed 3 years ago
more minimal:
<?xml version="1.0" encoding="UTF-8"?>
<dictionary type="separable">
<alphabet></alphabet>
<sdefs>
<sdef n="w"/>
<sdef n="adj"/>
</sdefs>
<pardefs>
<pardef n="meh">
<e><i>meh<s n="adj"/><t/><j/></i></e>
</pardef>
</pardefs>
<section id="main" type="standard">
<e c="override below rule if w before">
<i><w/><s n="w"/><t/><j/></i>
<i>D<s n="adj"/><t/><j/></i>
</e>
<e c="drop DROP and LEFT→RIGHT">
<p><l>D<t/><j/></l> <r></r></p>
<p><l>L</l> <r>R</r></p> <i><t/><j/></i>
</e>
</section>
</dictionary>
so we can view the fst:
$ lsx-comp lr apertium-nno-nob.nob-nno.lsx nob-nno.autoseq.bin
main@standard 14 19
$ lt-print nob-nno.autoseq.bin > seq.att
$ printf 'read att seq.att\nview\n' | foma
Seems like the problem is that the two paths get merged in the beginning there – why does that happen?
The path for the second rule should just be (no optional side-tracking into ANY_CHAR).
It seems like a
<w/>
at the start of a rule can make the analyser move its position into a lexical unit even if the rule doesn't end up fully matching, allowing other rules to match from that point on.apertium-nno-nob.nob-nno.lsx:
None of the entries should've matched here, yet it seems like we had a partial match on the first one and then only backtracked back to where the second one was able to start matching (instead of backtracking outside of the word
^
).(thanks @victoria-tro for reporting)