apertium / apertium-recursive

Recursive structural transfer module for Apertium
https://wiki.apertium.org/wiki/Apertium-recursive
GNU General Public License v3.0
6 stars 4 forks source link

How to match non-ascii lemmas? #75

Closed unhammer closed 2 years ago

unhammer commented 3 years ago
$ cat foo.rtx
ij: _;
IJ: _;

IJ ->
    %ij
    (
        if (1.lemh/sl = "a" )
        { MATCHED@a.[1.lemh/sl] }
        el-if (1.lemh/sl = "æ" )
        { MATCHED@æ.[1.lemh/sl] }
        else
        { NO_MATCH@x.[1.lemh/sl] }
    )
    ;
$ rtx-comp foo.rtx foo.bin && echo '^a<ij>/a<ij>$ ^æ<ij>/æ<ij>$' | rtx-proc -r foo.bin

Applying rule 1 (line 5): ^a<ij>/a<ij>$

Applying output rule 0 (line 5): a<IJ> -> ^a<ij>/a<ij>$

No rule specified: ^MATCHED<a>a$
^MATCHED<a>$
Applying rule 1 (line 5): ^æ<ij>/æ<ij>$

Applying output rule 0 (line 5): æ<IJ> -> ^æ<ij>/æ<ij>$

No rule specified: ^NO_MATCH<x>æ$
^NO_MATCH<x>$
unhammer commented 3 years ago

Is there some special trick?

unhammer commented 3 years ago
$ cat bar.rtx
ij: _;
IJ: _;

IJ ->
    ij
    ?(1.lemh/sl = "æ" )
    { MATCHED@x }
    ;
$ rtx-comp bar.rtx bar.bin && echo '^æ<ij>/æ<ij>$' | rtx-proc -s  bar.bin
int 1
pushinput
string
 -> lemh
sourceclip
 -> æ
dup
string
 ->
equal
 -> false
jumponfalse
 -> false, jumping
string
 -> æ
equal
 -> false
jumpontrue
 -> false
rejectrule
^æ<ij>$

There's our æ

unhammer commented 3 years ago
$ echo 'å:_; å -> å{1};' > x; rtx-comp -s x b
å -> å

so everything is misdecoded

unhammer commented 2 years ago

https://github.com/apertium/apertium-recursive/commit/15d4cf28f5a8a2e8338e41184497858511bd4b83 seems to have fixed it