rule-initial <w/> can make other rules match

apertium / apertium-separable

Module for reordering separable/discontiguous multiwords.

GNU General Public License v3.0

4 stars 5 forks source link

It seems like a <w/> at the start of a rule can make the analyser move its position into a lexical unit even if the rule doesn't end up fully matching, allowing other rules to match from that point on.

apertium-nno-nob.nob-nno.lsx:

<?xml version="1.0" encoding="UTF-8"?>
<dictionary type="separable">

  <alphabet></alphabet>

  <sdefs>
    <sdef n="adj"/>
  </sdefs>

  <pardefs>
    <pardef n="meh">
      <e><i>meh<s n="adj"/><t/><j/></i></e>
    </pardef>
  </pardefs>

  <section id="main" type="standard">

    <e c="override below rule if adj before">
      <i><w/>stuffnotininput<s n="adj"/><t/><j/></i>
      <i>DROP<s n="adj"/><t/><j/></i>
    </e>

    <e c="drop DROP and LEFT→RIGHT">
      <p><l>DROP<t/><j/></l> <r></r></p>
      <p><l>LEFT</l>           <r>RIGHT</r></p> <i><t/><j/></i>
    </e>
  </section>

</dictionary>

$ lsx-comp lr apertium-nno-nob.nob-nno.lsx nob-nno.autoseq.bin
main@standard 39 44

$ echo '^keptDROP<adj><sg>$ ^LEFT<n><sg>$' | lsx-proc nob-nno.autoseq.bin
^keptRIGHT<n><sg>$

None of the entries should've matched here, yet it seems like we had a partial match on the first one and then only backtracked back to where the second one was able to start matching (instead of backtracking outside of the word ^).

(thanks @victoria-tro for reporting)

<?xml version="1.0" encoding="UTF-8"?> <dictionary type="separable"> <alphabet></alphabet> <sdefs> <sdef n="w"/> <sdef n="adj"/> </sdefs> <pardefs> <pardef n="meh"> <e>meh<s n="adj"/><t/><j/></e> </pardef> </pardefs> <section id="main" type="standard"> <e c="override below rule if w before"> <w/><s n="w"/><t/><j/> D<s n="adj"/><t/><j/> </e> <e c="drop DROP and LEFT→RIGHT"> <l>D<t/><j/></l> <r></r> <l>L</l> <r>R</r> <t/><j/> </e> </section> </dictionary>

apertium / apertium-separable

rule-initial <w/> can make other rules match #37