google-code-export / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

POS tagset extracted from French maltparser model looks very strange #225

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
POS tagset extracted from French maltparser model looks very strange:

Tagset [null] for layer 
[de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS] contains [39] tags: 
/CC /P /PONCT 4/DET ADJ ADJWH ADV ADVWH CC CLO CLR CLS CS DET DETWH ET I NC NPP 
P P+D P+PRO PONCT PREF PRO PROREL PROWH V VIMP VINF VPP VPR VS _9/NC _OPE/NC 
_S/ET _S/NPP _an/NC _h/NC 

It should be some French Treebank tagset, but with all the slashes something 
may be odd. The homepage of the model notes that normally there should be 
coarse + finegrained tags... not sure what to make of this.

Original issue reported on code.google.com by richard.eckart on 11 Sep 2013 at 9:07

GoogleCodeExporter commented 9 years ago
you mean this page? http://www.maltparser.org/mco/french_parser/fremalt.html

part-of-speech tags of the MElt tagger (Denis and Sagôt, 2009)
see: http://raweb.inria.fr/rapportsactivite/RA2009/alpage/uid89.html
paper: http://atoll.inria.fr/~sagot/pub/paclic09tagging.pdf

"In the original FTB, words are split into 13 main categories, themselves 
divided into 34 subcategories.
The version of the treebank we used was obtained by converting subcategories 
into a
tagset consisting of 28 tags, with a granularity that is intermediate between 
categories and subcategories.
Basically, these tags enhance main categories with information on the mood of 
verbs
and a few other lexical features. This expanded tagset has been shown to give 
the best statistical
parsing results for French (Crabbé and Candito, 2008).2
2 This tagset is known as TREEBANK+ in (Crabbé and Candito, 2008), and since 
then as CC (Candito et al., 2009)." 

this page has some more references, including (Crabbé and Candito, 2008):
http://alpage.inria.fr/statgram/frdep/fr_stat_dep_parsing.html

The tagset with the 28 tags is on page 8 of this paper:
http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi-taln2008-final.pd
f

Looking at this tagset, it seems something goes wrong in your POS tagset 
extraction from the maltparser model 

Original comment by eckle.kohler on 12 Sep 2013 at 6:32