Closed albbas closed 14 years ago
Date: 2010-06-12 21:15:09 +0200
From: Francis Tyers <
It would be nice for the purposes of MT from Finnish→North Sámi to be able to reduce the number of optional forms created in generation to one.
$ echo "200 vuoden historia" | apertium -d . fin-sme-chunker
^200
$ echo "200 vuoden historia" | apertium -d . fin-sme
200/200e/200d/200b/200š/200c/200:/200:e/200:d/200:b/200:š/200:c #jahki
==============================================================================
$ echo "200+Num+Sg+Nom" | dsme 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% 200+Num+Sg+Nom 200 200+Num+Sg+Nom 200- 200+Num+Sg+Nom 200-b 200+Num+Sg+Nom 200-c 200+Num+Sg+Nom 200-d 200+Num+Sg+Nom 200-e 200+Num+Sg+Nom 200-š 200+Num+Sg+Nom 200b 200+Num+Sg+Nom 200c 200+Num+Sg+Nom 200d 200+Num+Sg+Nom 200e 200+Num+Sg+Nom 200š 200+Num+Sg+Nom 200' 200+Num+Sg+Nom 200'b 200+Num+Sg+Nom 200'c 200+Num+Sg+Nom 200'd 200+Num+Sg+Nom 200'e 200+Num+Sg+Nom 200'š 200+Num+Sg+Nom 200: 200+Num+Sg+Nom 200:b 200+Num+Sg+Nom 200:c 200+Num+Sg+Nom 200:d 200+Num+Sg+Nom 200:e 200+Num+Sg+Nom 200:š
==============================================================================
The ideal fix would be something we can just grep out of the lexc file. See example in:
http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-sme-fin/dev/update-lexc.sh
Date: 2010-06-12 22:19:43 +0200
From: Sjur Nørstebø Moshagen <
Thomas, can you have a look? Most of these forms look like +Use/Sub for me (all ending in -X and 'X), and I don't understand why a number compounded with a letter would be analysed as the nominative of that number. It all looks very buggy to me.
Also the single number ending in a hyphen should not have been analysed as a nominative, but as a compound form.
Date: 2010-06-12 22:22:18 +0200
From: Sjur Nørstebø Moshagen <
Assigning it, and raising the importance of it, as this bug probably affects several components using the sme transducer.
Date: 2010-06-13 01:25:48 +0200
From: Francis Tyers <
From bug #849:
~/gtsvn/gt$echo "200+Num+Sg+Nom" | dsme 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% 200+Num+Sg+Nom 200
Fixed as from version 32426. The problem was identical upper side for several lower sides. The problem should now be ok in sme (at least there is no COUNTER ex. The issue is still open in smj.
============================================================================
Seems to work nicely now thanks!
$ echo "200 vuoden historia " | apertium -d . fin-sme 200 jagi historjá
Date: 2010-06-13 09:02:39 +0200
From: Sjur Nørstebø Moshagen <
Just a final note:
Almost all - if not all - of the unwanted forms are all marked +Use/Sub in the lexicon, identifying them as substandard forms. AFAICU you don't want substandard forms in generation, so one measure to take in any case is to remove these forms from the generating transducers. I would assume this to be the default behaviour.
If this is the case, then a line like the following is actually reduntant:
+Use/Sub+Use/Circ+Use/NG: ARABICCASECOLL ; ! This is the 1984s case.
since removing all +Use/Sub strings from the generating transducer would remove it. That is, +Use/NG would then only be needed for cases that are within the official norm.
Date: 2010-06-13 13:10:37 +0200
From: Francis Tyers <
(In reply to comment #4)
Just a final note:
Almost all - if not all - of the unwanted forms are all marked +Use/Sub in the lexicon, identifying them as substandard forms. AFAICU you don't want substandard forms in generation, so one measure to take in any case is to remove these forms from the generating transducers. I would assume this to be the default behaviour.
If this is the case, then a line like the following is actually reduntant:
+Use/Sub+Use/Circ+Use/NG: ARABICCASECOLL ; ! This is the 1984s case.
since removing all +Use/Sub strings from the generating transducer would remove it. That is, +Use/NG would then only be needed for cases that are within the official norm.
I added this rule (thanks Tommi and Unhammer!) and it seems to have resolved the problem.
!%+Dial/%-KJ Uselesspaths = %+Use/NG %+Use/Sub %+Use/NG %+Dial/%-GG %+Dial/%-GS ;
"Try again" Uselesspaths:0 /<= _ ;
Date: 2010-06-13 14:17:13 +0200
From: Trond Trosterud <
Sjur wrote: "I don't understand why a number compounded with a letter would be analysed as the nominative of that number."
That is easy to explain: We had entries like: +Sg+Nom+Use/Sub+Use/Circ:f # ; ! s. 123f. ! ! +Sg+Nom+Use/Sub+Use/Circ:ff # ; ! s. 123ff. ! !
They are now changed to: f+Sg+Nom+Use/Sub+Use/Circ:f # ; ! s. 123f. ! ! ff+Sg+Nom+Use/Sub+Use/Circ:ff # ; ! s. 123ff. ! !
The problem never surfaced until we started really generating stuff, like we do with the MT now.
I keep the bug open until it has been fixed for the other lgs.
Date: 2010-06-13 15:40:26 +0200
From: Trond Trosterud <
echo "200+Num+Sg+Nom" | dsmj 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% 200+Num+Sg+Nom 200
~/gtsvn$see gt/sma/src/numeral-sma-lex.txt ~/gtsvn$echo "200+Num+Sg+Nom" | dsma 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% 200+Num+Sg+Nom 200
~/gtsvn$echo "200+Num+Sg+Nom" | dfao 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% 200+Num+Sg+Nom 200+Num+Sg+Nom +?
~/gtsvn$echo "200+Num+Sg+Nom" | dsmn 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% 200+Num+Sg+Nom 200+Num+Sg+Nom +?
So, sme, smj, sma are ok here. Let the rest come when needed.
This issue was created automatically with bugzilla2github
Bugzilla Bug 848
Date: 2010-06-12T21:15:09+02:00 From: Francis Tyers <>
To: Thomas Omma <>
CC: sjur.n.moshagen, trond.trosterud, @unhammer@fsfe.org
Last updated: 2010-06-13T15:40:26+02:00