Closed albbas closed 7 years ago
Date: 2013-11-14 16:27:28 +0100
From: Trond Trosterud <
To repeat:
Add the following to the bottom of stems/nouns.lexc:
xîxa INDECL "Trond testing" ; xixâ INDECL "Trond testing" ; xêxa INDECL "Trond testing" ; xâxa INDECL "Trond testing" ; xôxa INDECL "Trond testing" ; xâxi INDECL "Trond testing" ; wâki INDECL "Trond testing" ; xâwi INDECL "Trond testing" ; wâwo INDECL "Trond testing" ;
Then, test:
wawi wawi wâwi+N+IN+Sg
wawo wawo wâwo+N+IN+Sg
wâwo wâwo wâwo +?
wâwi wâwi wâwi +?
xâwi xâwi xâwi+N+IN+Sg
xawi xawi xâwi+N+IN+Sg
The strange thing: The spellrelax â (->) a in the src/orthography/spellrelax.regex file should treat all a alike (as â candidates), but that happens only as long as it is not placed between two w.
So, what happens?
Date: 2013-12-30 17:20:53 +0100
From: Trond Trosterud <
This is a xerox bug, cf this report.
On Nov 15, 2013, at 9:46 AM, Trosterud Trond trond.trosterud@uit.no wrote:
An unexpected issue turned up when debugging a problem in our Plains Cree analyser. It turns out that xfst is not able to analyse words with the sequence wâw, but other sequences are ok. When we then add a spellrelax â (->) a in the bottom, it works fine. But when running the same code through hfst, it behaves.
Here we show output without spellrelax (with the spellrelax, wawa and wewa would have been recognised as wâwa and wêwa, respectively).
tf-hsl-m0016:crk ttr000$ hfst-lookup src/analyser-gt-desc.hfst pisiw pisiwpisiw+N+AN+Sg0.000000
wâwa wâwawâwa+N+AN+Sg0.000000
wêwa wêwawêwa+N+AN+Sg0.000000
^C tf-hsl-m0016:crk ttr000$ lookup src/analyser-gt-desc.xfst
LEXICON LOOK-UP
pisiw pisiwpisiw+N+AN+Sg
wâwa wâwawâwa+?
wêwa wêwawêwa+N+AN+Sg
To repeat: xfst -e "read lexc ctest.lexc" up wêwa up wâwa
The source files are available online as well, here:
http://giellatekno.uit.no/doc/lang/crk/PlainsCreeDocumentation.html
We also have a similar case for Russian, but not containing flag diacritcs, here an output from our testbench:
YAML test 25: ./N-Ж_Ф-железа_gt-norm.yaml + analyser-gt-norm.hfst - PASS YAML test 25: ./N-Ж_Ф-железа_gt-norm.yaml + analyser-gt-norm.xfst - FAIL To rerun with more details, please triple-click, copy and paste the following:
pushd /Users/ttr000/main/langs/rus/test/src/morphology; /opt/local/bin/python3.2 /Users/ttr000/main/gtcore/scripts/morph-test.py -c -i -S xerox --app /Users/ttr000/bin/lookup --gen ./../../../src/generator-gt-norm.xfst --morph ./../../../src/analyser-gt-norm.xfst ./N-Ж_Ф-железа_gt-norm.yaml; popd
The point here is that the same test passes for hfst but fails for xfst (the normal case is of course that the two fst-s agree on the verdict). What happens is that some twolc rule essentially moves stress (exchanges é with e etc.) to the first syllable of the word, this works, but not when there is a ë involved. Cf.:
http://giellatekno.uit.no/doc/lang/rus/RussianDocumentation.html
Our situation now is that we run the hfst and xerox tools in parallel (as the test outcome shows), and we are not dependent upon the xerox tools working. They do work for our core languages (the Saami languages), but now and then we thus stumble upon problems like these (cf. the Komi capitalisation issue some while ago).
Thanks for your bug report. Ask me after a few weeks about whether there is already a fix. --
Date: 2014-04-21 21:24:36 +0200
From: Trond Trosterud <
New letter sent.
Date: 2015-03-26 11:32:58 +0100
From: Sjur Nørstebø Moshagen <
Here is another strange Xerox bug:
When making multichar symbols optional (e.g. for generation), such symbols ending in the character sequence 'Obj' (without the quotes, with the exact capitalisation as shown), the symbol can't be used as part of an input string.
I have checked all possible things in our own source code, like missing multicharacter declaration, no-breaking spaces, missing colons, etc, but everything seems fine. And there are no such problems with any of the other similar tags as far as I can tell.
The test data below is from SMA, with the relevant tag manually changed between different make & test runs to see what is working and what is not:
$ lookup -q src/generator-gt-norm.xfst Windows+N+Prop+Sg+Nom Windows+N+Prop+Sg+Nom Windows
Windows+N+Prop+Sem/Abj+Sg+Nom Windows+N+Prop+Sem/Abj+Sg+Nom Windows
YouTube+N+Prop+Sem/Obj+Sg+Nom YouTube+N+Prop+Sem/Obj+Sg+Nom YouTube+N+Prop+Sem/Obj+Sg+Nom +?
YouTube+N+Prop+Sg+Nom YouTube+N+Prop+Sg+Nom YouTube
$ lookup -q src/generator-gt-norm.xfst Windows+N+Prop+Sg+Nom Windows+N+Prop+Sg+Nom Windows
Windows+N+Prop+Sem/obj+Sg+Nom Windows+N+Prop+Sem/obj+Sg+Nom Windows
$ lookup -q src/generator-gt-norm.xfst Windows+N+Prop+Sg+Nom Windows+N+Prop+Sg+Nom Windows
Windows+N+Prop+Sem/Object+Sg+Nom Windows+N+Prop+Sem/Object+Sg+Nom Windows
The bug seems to be restricted to the lookup tool, cf the following:
Inspect: 1 @U.Cap.Obl@ W i n d o w s 0:+N 0:+Prop 0:+Sem/Obj @U.Cap.Obl@ 0:+Sg 0:+Nom @D.CmpOnly.FALSE@ @D.CmpPref.TRUE@ @D.NeedNoun.ON@ --> Level 18 (final)
Inspect: 1 @U.Cap.Obl@ W i n d o w s 0:+N 0:+Prop @U.Cap.Obl@ 0:+Sg 0:+Nom @D.CmpOnly.FALSE@ @D.CmpPref.TRUE@ @D.NeedNoun.ON@ --> Level 17 (final)
xfst[1]: up Windows+N+Prop+Sg+Nom Windows
xfst[1]: up Windows+N+Prop+Sem/Obj+Sg+Nom Windows xfst[1]: down Windows Windows+N+Prop+Sem/Obj+Attr Windows+N+Prop+Sem/Obj+Sg+Nom Windows+N+Prop+Attr Windows+N+Prop+Sg+Nom
$ lookup -q src/generator-gt-norm.xfst Windows+N+Prop+Sem/Obj+Sg+Nom Windows+N+Prop+Sem/Obj+Sg+Nom Windows+N+Prop+Sem/Obj+Sg+Nom +?
Windows+N+Prop+Sg+Nom Windows+N+Prop+Sg+Nom Windows
Further, it seems restricted to (one of) the latest version(s) of lookup only:
$ lookup -v lookup 2.5.19 (2.25.11)
Using an older version of lookup, everything works as expected:
$ ~/Downloads/xerox.100211/lookup -v lookup 2.5.14 (2.14.10) $ ~/Downloads/xerox.100211/lookup -q src/generator-gt-norm.xfst Windows+N+Prop+Sg+Nom Windows+N+Prop+Sg+Nom Windows
Windows+N+Prop+Sem/Obj+Sg+Nom Windows+N+Prop+Sem/Obj+Sg+Nom Windows
Date: 2016-12-15 20:58:10 +0100
From: Trond Trosterud <
This is a Xerox bug, and given the state of affairs, I suggest a WONTFIX.
Date: 2016-12-18 13:40:57 +0100
From: Trond Trosterud <
Thus, cnsensus on a WONTFIX
This issue was created automatically with bugzilla2github
Bugzilla Bug 1739
Date: 2013-11-14T16:27:28+01:00 From: Trond Trosterud <>
To: Trond Trosterud <>
CC: sjur.n.moshagen
Last updated: 2016-12-18T13:40:57+01:00