UniversalDependencies / UD_German-GSD

Other
18 stars 5 forks source link

#14 Improving POSTAG and FEATS with mate and Tiger #16

Closed adrianeboyd closed 6 years ago

adrianeboyd commented 6 years ago

The XPOS and FEATS have been updated using the mate parser with Tueba-D/Z. The mate parser model was trained on the first 80% of Tueba-D/Z as converted by an updated version of the TuebaUdConverter that improved the morphological tag extraction. Tueba-D/Z tokens containing hyphens were split into multiple tokens to align with the UD tokenization.

During tagging, `` '' and -- were temporarily normalized to " and - and some new spellings were normalized back to old spellings to be compatible with the Tueba-D/Z model. Tueba-D/Z is fairly homogenous and somewhat dated, so the tagger does not perform as well on user-generated content as on older newspaper text. A detailed analysis is below.

After incorporating the automatic mate POSTAG and FEATS, the UD sentences were aligned with Tiger using the mapping in #13 and LEMMA (for content words) and POSTAG and FEATS (for all tokens) were updated with gold values from Tiger. The only exception the feature Voice, whose value is passed through from mate since it's not annotated on the aux in Tiger.

Ordinal numbers split into two tokens in UD (e.g., "3 .") were rejoined as in Tiger and compounds were reanalyzed in order to provide consistent FEATS for all subparts. If any subpart of a compound was tagged as NE by the mate parser, the deprel flat was used, otherwise compound.


Mate model analysis:

The accuracy on Tueba-D/Z test data for XPOSTAG is 98.3% and for FEATS as an unanalyzed string is 91.7%. Analyzing the individual morphological tags, the results are:

           Acc.    Prec. Rec.  F1  
Case       0.95580 0.996 0.911 0.952
Definite   0.99653 0.986 0.987 0.986
Foreign    0.99835 0.646 0.714 0.678
Gender     0.96150 0.982 0.929 0.955
Mood       0.99554 0.984 0.959 0.972
NumType    0.99988 0.991 0.998 0.995
Number     0.97852 0.995 0.966 0.980
Person     0.99481 0.986 0.966 0.976
Polarity   0.99998 0.999 0.999 0.999
Poss       0.99993 0.995 0.997 0.996
PronType   0.99734 0.998 0.988 0.993
Reflex     0.99991 0.996 0.992 0.994
Tense      0.99567 0.983 0.962 0.972
VerbForm   0.99492 0.993 0.965 0.978
Voice      0.99934 0.934 0.967 0.950

Case and Gender are not unexpectedly the least accurate of the frequent features. Foreign could potentially be removed, although when inspecting the UD instances the precision seems relatively high.

adrianeboyd commented 6 years ago

I should have referenced #14 in the comment, adding here to link.