0xCAB / morphisto

Automatically exported from code.google.com/p/morphisto
0 stars 0 forks source link

"ordn-" vs. "kett-" #48

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

echo "Ordnung" | fst-infl2  -b smor-ids.ca

What is the expected output? What do you see instead?
Expected:
> Ordnung
o:Ordn:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Nom>:<><Sg>:<>
o:Ordn:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Gen>:<><Sg>:<>
o:Ordn:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Dat>:<><Sg>:<>
o:Ordn:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Akk>:<><Sg>:<>
Seen:
> Ordnung
Ordnung<+NN>:<><Fem>:<><Nom>:<><Sg>:<>
Ordnung<+NN>:<><Fem>:<><Gen>:<><Sg>:<>
Ordnung<+NN>:<><Fem>:<><Dat>:<><Sg>:<>
Ordnung<+NN>:<><Fem>:<><Akk>:<><Sg>:<>

Other verb stems like "kett-" can be correctly derivated. The full form entry 
of "Ordnung" should normally be superfluous.

Greping for ">ordn" and ">kett" yields the same results. There are probably 
some phonological constraints interfering. Any ideas?

Original issue reported on code.google.com by wuerz...@gmail.com on 30 Aug 2011 at 11:13

GoogleCodeExporter commented 8 years ago
This same problem with "rechn-" and "ebn-".

Original comment by wuerz...@gmail.com on 30 Aug 2011 at 11:35

GoogleCodeExporter commented 8 years ago
I disagree with this. Dictionary forms, and Ordnung is a dictionary form, 
should be displayed in any case. Ordn is not a dictionary form, anyway, 
therefore the expected output is for me a nonsense.

Original comment by glukri...@gmx.de on 30 Aug 2011 at 12:31

GoogleCodeExporter commented 8 years ago
A question to @wuerz:
What is your purpose with morphisto? What are you using it for?

Original comment by eleonor...@gmx.net on 30 Aug 2011 at 12:34

GoogleCodeExporter commented 8 years ago
Argh, of course your right. The expected output should be like this:
Expected:
> Ordnung
o:Ordne:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Nom>:<><Sg>:<>
o:Ordne:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Gen>:<><Sg>:<>
o:Ordne:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Dat>:<><Sg>:<>
o:Ordne:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Akk>:<><Sg>:<>

or without "-b": ordnen<V>ung<SUFF><+NN>.

Nonetheless, "Ordnung" is a transparent derivation of "ordnen" and therefore 
potentially superfluous in the lexicon. Although, it will be kept in the 
trunk's dictionary. My purpose with morphisto is to reduce its lexicon to the 
morphological simple entries by removing all derivations and compounds to be 
able to perform real morphological segmentation. Of course I am only doing this 
in the branch "kmw" and also in a transparent, reconstructable way. All of the 
fixes you are committing will be included in the main development branch.

On the other hand I really doubt that the lexicon of any morphological analysis 
tool should be blown up with words like "Abschiebegewahrsam". I can think of no 
application for morphisto where on would need this entry since morphisto does 
not come along with some word semantics.

Original comment by wuerz...@gmail.com on 30 Aug 2011 at 12:46

GoogleCodeExporter commented 8 years ago
Thanks for the explanation. If such changes only go in a special branch, I have 
no problem with that.
For any translation project is Ordnung like all other dictionary words a must 
to have.

I agree with you, Abschiebegewahrsam is a true compound, that can be in 
morphisto as a sum of two words, like now. 
Exceptions are words like Hammelsprung, that may mean something very different, 
than the sum of the words it is created from. 

Original comment by eleonor...@gmx.net on 30 Aug 2011 at 2:04

GoogleCodeExporter commented 8 years ago
@CWRSimon Could you please have a look at this issue? I browsed through 
phon.fst but did not find an appropriate rule.

Original comment by wuerz...@gmail.com on 31 Aug 2011 at 9:03

GoogleCodeExporter commented 8 years ago
The problem does not exist for "atm-". It seems to be specific for verbal stems 
on "-n".

Original comment by wuerz...@gmail.com on 31 Aug 2011 at 11:23

GoogleCodeExporter commented 8 years ago
The evil rule lives in defaults.fst:
$R$ = ([bdgptkfs] | ch) n <=> <en> (<V>)

It is thus possible to derive "Ordenung":
> Ordenung
ordnen<V>ung<SUFF><+NN><Fem><Nom><Sg>
ordnen<V>ung<SUFF><+NN><Fem><Gen><Sg>
ordnen<V>ung<SUFF><+NN><Fem><Dat><Sg>
ordnen<V>ung<SUFF><+NN><Fem><Akk><Sg>

Any ideas?

Original comment by wuerz...@gmail.com on 31 Aug 2011 at 6:49

GoogleCodeExporter commented 8 years ago
> ordentlich
ordentlich<+ADJ><Pos><Adv>
ordentlich<+ADJ><Pos><Pred>

> ordenlich
ordnen<V>lich<SUFF><+ADJ><Pos><Adv>
ordnen<V>lich<SUFF><+ADJ><Pos><Pred>

Original comment by wuerz...@gmail.com on 8 Sep 2011 at 1:30

GoogleCodeExporter commented 8 years ago
> ordenlich
ord<>:ene:<>n:<><V>:<>lich<SUFF>:<><+ADJ>:<><Pos>:<><Adv>:<>
ord<>:ene:<>n:<><V>:<>lich<SUFF>:<><+ADJ>:<><Pos>:<><Pred>:<>

> Kettung
k:Kette:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Nom>:<><Sg>:<>
k:Kette:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Gen>:<><Sg>:<>
k:Kette:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Dat>:<><Sg>:<>
k:Kette:<>n:<><V>:<>ung<SUFF>:<><+NN>:<><Fem>:<><Akk>:<><Sg>:<>

Original comment by wuerz...@gmail.com on 8 Sep 2011 at 1:32

GoogleCodeExporter commented 8 years ago
Added the following rule to phon.fst:

$R11b$ = ([bdgptkfs] | ch) <e> <=> <> ([n] (<CB>|$Bound$) [aeiou])

This rule deletes the previously inserted "<e>" (which is to be understood as a 
request for e insertion in phon.fst) under certain circumstances (i.e. if the 
follwing morph starts with a vowel). Now, we are able to distinguish 
"orden-*b*ar" from "Ordn-*ung*" and the non-existent word "Ordenung" can not be 
derived anymore.

Original comment by wuerz...@gmail.com on 9 Sep 2011 at 11:02

GoogleCodeExporter commented 8 years ago
Issue 17 has been merged into this issue.

Original comment by wuerz...@gmail.com on 9 Sep 2011 at 2:22