giellalt / bugzilla-dummy

0 stars 0 forks source link

sma and sme number generators are flawed (Bugzilla Bug 1507) #1633

Closed albbas closed 9 years ago

albbas commented 11 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 1507

Date: 2012-11-07T22:54:41+01:00 From: Trond Trosterud <> To: Sjur Nørstebø Moshagen <> CC: sjur.n.moshagen

Last updated: 2015-02-12T23:57:09+01:00

albbas commented 11 years ago

Comment 7342

Date: 2012-11-07 22:54:41 +0100 From: Trond Trosterud <>

smj$xfst -e "load src/transcriptions/numbers.xfst " Opening input file 'src/transcriptions/numbers.xfst' October 07, 2012 22:58:29 GMT Closing input file 'src/transcriptions/numbers.xfst' Copyright © Palo Alto Research Center 2001-2012 Xerox Finite-State Tool, version 2.10.7 (2.11.1)

Type "help" to list all commands available or "help help" for further help.

xfst[1]: random-upper +StringB +Stringo +NumNumgovhteluhkieåktsede gráda +String +NumNumvïjhteluhkiegaektsie +NumNum ja +NumNumakte ja +StringcäMA +NumNumgaektsielåhkede gráda +String +NumNum +NumNumgöökte +StringđDŽölŋ +StringÖĐJEŊsH +StringO +NumNum +NumNumnieljie+Use/NG miljona gööktestoerreluhkie+Use/NGgöökteluhkiegööktetåvsenegovhtetjuetiegovhtelåhkede +NumNumnjealjede paragráfa +StringöLcÁŦØåN +StringnLæđGUJKza xfst[1]: random-lower +NumNum5. +NumNum950=+NumNum1&+StringYFT +StringA=+StringGZ=+NumNum122. +Stringä +NumNum6.°+NumNum +NumNum9.§+StringŋT +StringebiØčsåUČKÆ +String +NumNum +Stringl +StringyŋTII +StringU +String +NumNum120.§+NumNum +StringWf

Closing input file 'src/transcriptions/numbers.xfst' Copyright © Palo Alto Research Center 2001-2012 Xerox Finite-State Tool, version 2.10.7 (2.11.1)

Type "help" to list all commands available or "help help" for further help.

xfst[1]: random-upper +StringB +Stringo +NumNumgovhteluhkieåktsede gráda +String +NumNumvïjhteluhkiegaektsie +NumNum ja +NumNumakte ja +StringcäMA +NumNumgaektsielåhkede gráda +String +NumNum +NumNumgöökte +StringđDŽölŋ +StringÖĐJEŊsH +StringO +NumNum +NumNumnieljie+Use/NG miljona gööktestoerreluhkie+Use/NGgöökteluhkiegööktetåvsenegovhtetjuetiegovhtelåhkede +NumNumnjealjede paragráfa +StringöLcÁŦØåN +StringnLæđGUJKza xfst[1]: random-lower +NumNum5. +NumNum950=+NumNum1&+StringYFT +StringA=+StringGZ=+NumNum122. +Stringä +NumNum6.°+NumNum +NumNum9.§+StringŋT +StringebiØčsåUČKÆ +String +NumNum +Stringl +StringyŋTII +StringU +String +NumNum120.§+NumNum +StringWf xfst[1]:

albbas commented 11 years ago

Comment 7343

Date: 2012-11-07 23:00:08 +0100 From: Trond Trosterud <>

Quoting the svn log:


r35365 | sjur | 2010-11-08 10:40:53 +0100 (man, 08 nov 2010) | 1 line

Renamed +Num to +NumNum, to avoid trouble with the POS tag +Num.

r35362 | sjur | 2010-11-08 10:27:27 +0100 (man, 08 nov 2010) | 1 line

Added the tags +String and +Num, to be able to differentiate them when doing fst manipulations. To be removed before the transducer is saved.


(note especially the last sentence :-)

albbas commented 11 years ago

Comment 7344

Date: 2012-11-07 23:09:05 +0100 From: Trond Trosterud <>

Changing the preamble to

LEXICON Root !< %+String [a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|q|y|z|æ|ø|å|ä|ö|á|č|đ|ŋ|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z|Æ|Ø|Å|Ä|Ö|Á|Č|Đ|Ŋ|Š|Ŧ|Ž]* > COMMA ; ! This first line is to allow all letter strings. !+NumNum NUMBERSECTION ; !+NumNum COMMASECTION ; NUMBERSECTION ; COMMASECTION ;

... i get the following:

4 4 njieljie 4 njielje +Use/NG 4 nieljie +Use/NG 4 nielje +Use/NG

So, for a numberword - arabic - number word automaton I need this LEXICON Root, + a filter removing +Use/NG strings for generation. (cf. http://giellatekno.uit.no/num.nno.html which now is flawed for smj and sma).

Whatever other usage areas will perhaps be able to use the Root lexicon now found in svn.

albbas commented 11 years ago

Comment 8005

Date: 2013-02-26 22:36:00 +0100 From: Sjur Nørstebø Moshagen <>

Don't know when I will have time to fix this, though...

albbas commented 9 years ago

Comment 10155

Date: 2015-02-12 23:57:09 +0100 From: Sjur Nørstebø Moshagen <>

It seems that this bug has been fixed by the restructuring of the transciptor files last year:

$ xfst -s src/transcriptions/transcriptor-numbers-digit2text.filtered.lookup.xfst xfst[1]: print random-upper 1000001200. 1. 26.5.70§1.&2.,1800.°3848855§8.4-mánnosačča 24.7. 26.4.399. 13.2.280§7 1207150-jahkásaččat 28.01. 1.&3. 18.10.02. 25.9.19..10=171580700.-1&3 23.04.161.,8.,2.=13°1500.&2-gearddásažžii &2.&2672. 2.4. 14.6. xfst[1]: print random-lower guovvamánu guoktelogigoalmmát beaivi gávccilotjahkásažžan sárggis čuođilogát lea guđamánnosaččas vuosttaš čuokkis njealját lea gávccilogiguđajahkásaš guovvamánu oktanuppelogát beaivi vihttaduhátgávccičuođinjealljeloginubbi kolon vuosttaš komma viđát lea duhátovccičuođigolbmalotjahkásaš guokte sárggis goalmmát paragráfa guoktemiljovnnavihttačuođiovccilogigolbmaduhátgolbmačuođičiežanuppelotjahkásaš duhátčiežačuođiguoktelogiguokte lea okta čuokkis guhtta ja guokte lea ovcci ja nubbi paragráfa ovccát ođđajágimánu guokteloginubbi beaivi njealljelotlohkui násti logigearddásaččat guđát paragráfa gávccát gráda čuođát komma miljončiežačuođigávccilogigávcciduhátgávccičuođivihttalogi gráda logiduhátnjeallječuođát násti ovccičuođát gráda njealját čuokkis goalmmát ja goalmmát čuokkis njealját duhátguhttačuođiguoktelogi kolon čieža skábmamánu čihččet beaivi gávccilogi komma duhátčiežačuođát miljárdagávccičuođigávccilogivihttamiljovnnaduhátgávccičuođinjealljelogi guđát sárggis čuođát sárggis guđát

Closing the bug.