Open unhammer opened 6 years ago
I lost my laptop three weeks ago, so it'll be a while before I can look at this.
On Thursday, 25 October 2018, Kevin Brubeck Unhammer < notifications@github.com> wrote:
Assigned #35 https://github.com/apertium/lttoolbox/issues/35 to @jimregan https://github.com/jimregan.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/apertium/lttoolbox/issues/35#event-1925711741, or mute the thread https://github.com/notifications/unsubscribe-auth/AAN4FoMJsbXjFMdMmrDhxgKbjMluOebpks5uoYc4gaJpZM4X53Y5 .
ouch :((
I added some tests in fd6e6dc – it turns out to be problematic if we start generating ^KAKE<n><f><pl><def>$
and see a possible path that starts ^K
but then only ends up in other analyses (e.g. ^KK<np>$
). Then we end up with #KAKE
where we should have tried a lowercased analysis.
But if there were no such garden paths, ^KAKE<n><f><pl><def>$
does give an analysis – see difference between the two test dix'es added https://github.com/apertium/lttoolbox/commit/fd6e6dca7562200e182d77b65bc759380d95df08#diff-839e968af7bf80a08ea4d97247cbe7fdR1
@mr-martian Do you think this is solvable? I'd love to have a solution for this (but in bilingual mode lt-proc -b
), s.t. that I can e.g. have a dix with
<e> <re>[a-zA-Z]+</re><p><l></l><r><s n="np"/></r></p></e>
<e> <i>med</i> <p><l></l><r><s n="pr"/></r></p></e>
and get
$ echo '^Med<pr>$ ^AbCd<np>$' |lt-proc -C -b nob-nno.autogen.bin
^Med<pr>/Med$ ^AbCd<np>/AbCd$
Currently, we can get either the one or the other:
$ echo '^Med<pr>$ ^AbCd<np>$' |lt-proc -C tmp.bin # eats Med
AbCd
$ echo '^Med<pr>$ ^AbCd<np>$' |lt-proc -b tmp.bin # includes extra "Abcd"
^Med<pr>/Med$ ^AbCd<np>/AbCd/Abcd$
$ echo '^Med<pr>$ ^AbCd<np>$' |lt-proc -c -g tmp.bin # fails to generate Med since lemma is lowercase
#Med AbCd
Possibly related to https://github.com/apertium/lttoolbox/issues/167
If the dictionary has
then we get
I would like it to just fall back to "normal" generation for words it can't find exact case for, ie.
while still retaining the -C functionality for words it can find exact matches for