0xCAB / morphisto

Automatically exported from code.google.com/p/morphisto
0 stars 0 forks source link

Verb prefixes do not show up as verb prefixes in analysis mode #49

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
If a linguistic problem:
What wordform makes the faulty analysis occur?
Verb prefixes show up as adv (not all of them), but do not show up as verb 
prefix
>hin
hin<+ADV>

They all have entries like:
<Pref_Stems>frei<PREF><V><nativ>
Why is this not shown in analysis state?

A verb prefix can stand alone:
Er ging hinüber
Sie nahmen teil ...

therefore analysis should show, the word is a verb prefix

Original issue reported on code.google.com by eleonor...@gmx.net on 30 Aug 2011 at 1:08

GoogleCodeExporter commented 8 years ago
This might be related to issue 36.

Original comment by eleonor...@gmx.net on 30 Aug 2011 at 1:10

GoogleCodeExporter commented 8 years ago
You are definitely right. What we need is an additional category like 'PTKVZ' 
in STTS.

Original comment by wuerz...@gmail.com on 30 Aug 2011 at 3:24

GoogleCodeExporter commented 8 years ago
I modified the .fst files in branches/kmw to include the inflection class 
"Ptkl-Vz". Separable verb prefixes may now be defined as 

<BaseStem>
                <Lemma>auf</Lemma>
                <Stem>auf</Stem>
                <Pos>OTHER</Pos>
                <Origin>nativ</Origin>
                <InfClass>Ptkl-Vz</InfClass>
</BaseStem>

Which should result in auf<+PTKL><Vz> (where Vz denotes Verbzusatz as in STTS).

Original comment by wuerz...@gmail.com on 31 Aug 2011 at 9:55

GoogleCodeExporter commented 8 years ago
Issue 36 has been merged into this issue.

Original comment by wuerz...@gmail.com on 31 Aug 2011 at 9:56

GoogleCodeExporter commented 8 years ago
Since we can not distinguish separable verb prefixes in prefixes.xml by now, I 
prefer this solution (i.e., to hard code them in e.g. others.xml) against a 
"null morph conversion" in suff_stems.xml.

Original comment by wuerz...@gmail.com on 31 Aug 2011 at 9:59

GoogleCodeExporter commented 8 years ago
what do the classes Pref/X (Pref/V, Pref/Adj and others) in others.xml signify? 
Isn't the current <+VPRE> synonymous with PTKVZ?

Original comment by rico.sen...@googlemail.com on 31 Aug 2011 at 1:32

GoogleCodeExporter commented 8 years ago
Possibly. But there is only one entry with
"<InfClass>Pref/V</InfClass>" which is "lieben" in others.xml. This
would not be "PTKVZ" according to the STTS standards. Although this
might be an erroneous entry. What do you suggest?

Original comment by wuerz...@gmail.com on 31 Aug 2011 at 1:41

GoogleCodeExporter commented 8 years ago
I think "lieben" is an error, and Pref/X is intended to signifiy verb particles:

> ab
ab<+PREP><Dat>
ab<+VPRE>

It is true that there are only four entries in others.xml, two of them probably 
wrong. I don't care about whether to use <+VPRE> or <+PTKVZ>, as long as it's 
consistent.

Personally, I'd automatically generate the analysis <+VPRE> for all PrefStems 
in prefixes.xml that have the tag <Pos>V</Pos>. This generates less overhead 
than having to duplicate all 200 entries (you could do the initial duplication 
automatically, but this would create extra work whenever you want to modify/add 
a prefix). 

Automatically analysing all verb prefixes in prefixes.xml as verb particles 
will generate a few false positives, like "ver" and "be". If you want to get 
rid of them, one could introduce a new tag to distinguish between separable and 
non-separable prefixes (or set "<InfClass>" for all separable ones, for 
instance).

This solution requires a bit of knowledge about the transducer though, and I 
don't currently know how to best implement this.

Original comment by rico.sen...@googlemail.com on 31 Aug 2011 at 2:09

GoogleCodeExporter commented 8 years ago
I only greped for "Pef/V". There is also "Pref/Sep". Sorry for missing that 
one. I agree with your proposal. It might even be possible to implement that 
via a "SuffStem" without touching the transducer. I will look into that.

Original comment by wuerz...@gmail.com on 31 Aug 2011 at 2:21

GoogleCodeExporter commented 8 years ago
Here is my proposal: Separable and non-separable prefixes can be distinguished 
by the way the past participle of the corresponding verbs is made up (i.e. 
"verschafft" vs. "angeschafft"). This is implemented through the feature 
"<no-ge>" in morphisto (cf. deko.fst). The attached file filters the prefixes 
for this feature and creates an analysis as "<+VPRE>". If you like that 
solution, I will integrate it. Fell free to modify (simplify) the transducer.

Original comment by wuerz...@gmail.com on 8 Sep 2011 at 8:28

Attachments:

GoogleCodeExporter commented 8 years ago
looks great! I'm all for integrating it into the trunk.

Original comment by rico.sen...@googlemail.com on 8 Sep 2011 at 8:41

GoogleCodeExporter commented 8 years ago
It works. Concerning the right category, there are three options:
- PTKVZ (from stts)
- VPRE (from SMOR currently, used)
- PTKL/Vz (from SMOR, inline with other types of particles)

I like the third option. What do you think?

> ein
ein<+VPRE>
ein<+ART><Indef><Masc><Nom><Sg>
ein<+ART><Indef><Neut><Nom><Sg>
ein<+ART><Indef><Neut><Akk><Sg>
> aus
aus<+PTKL><Vz>
aus<+VPRE>
aus<+PREP><Dat>
> durch
durch<+VPRE>
durch<+PREP><Akk>
> ver
no result for ver
> zu
zu<+PTKL><Adj>
zu<+PTKL><zu>
zu<+VPRE>
zu<+PREP><Dat>
> auf
auf<+PTKL><Vz>
auf<+VPRE>
auf<+PREP><Dat>
auf<+PREP><Akk>
> an
an<+PTKL><Vz>
an<+VPRE>
an<+CIRCP>
an<+PREP><Dat>
an<+PREP><Akk>
> weg
weg<+VPRE>
weg<+ADV>
> hin
hin<+VPRE>
hin<+ADV>

Original comment by wuerz...@gmail.com on 8 Sep 2011 at 4:15

GoogleCodeExporter commented 8 years ago
I think <+PTKL><Vz> would be a good choice. The existing code that generates 
<+VPRE> analyses should be removed/disabled, so that there's no confusion.

Original comment by rico.sen...@googlemail.com on 9 Sep 2011 at 9:47