Closed unhammer closed 5 years ago
If you have legge# opp til in your monolingual analyser, and try to analyse input
legge# opp til
legge opp<br/>blah
in html-format, lt-proc will shift the <br/> into the middle of the analysis:
<br/>
$ echo 'legge opp<br/>blah' |apertium-deshtml legge opp[<br\/>]blah.[][ ]
↑ here it's still at the end
$ echo 'legge opp<br/>blah' |apertium-deshtml |lt-proc -we ../apertium-nno-nob/nob-nno.automorf.bin ^legge/legge<vblex><inf>$[<br\/>]^opp/opp<pr>/opp<adv>/oppe<vblex><imp>$ ^blah/*blah$^./.<sent><clb>$[][ ]
but ↑ here it's in the middle of the multiword.
From the code, it seems like what happens is that we
legge
last=6
lf=/legge<vblex><inf>
legge opp[<br/>]
til
[<br/>]
blankqueue
b
printWord
printSpace
If you have
legge# opp til
in your monolingual analyser, and try to analyse inputlegge opp<br/>blah
in html-format, lt-proc will shift the
<br/>
into the middle of the analysis:↑ here it's still at the end
but ↑ here it's in the middle of the multiword.
From the code, it seems like what happens is that we
legge
, we've now seen a nonalphabetic after a final, so the indexlast=6
andlf=/legge<vblex><inf>
.legge opp[<br/>]
, where we still don't know if we'll seetil
at the right, so[<br/>]
ends up inblankqueue
b
, meaning we can't go further in that mwe, so we have to skip back to the last full analysisprintWord
with surface formlegge
printSpace
, which completely flushesblankqueue
if there is one, otherwise outputs a space.