dan-zeman / interset

Interset is an interlingua for morphosyntactic tag sets, needed in many tasks in natural language processing.
Other
5 stars 3 forks source link

PA0MS = Invalid tag (Freeling conversion) #4

Closed livyreal closed 6 years ago

livyreal commented 7 years ago

The Interset conversion from UD tags to Freeling (EAGLES tagset) generated some tags that do not exist in Freeling. One of them is: PRON PronType=Art|Number=Sing|Gender=Masc = PA0MS

POS tag Pronoun (P) does not have a feature A (article) in Freeling Portuguese tagset (eagles tagset).

We have discussed this specific issue here and agreed that the best Freeling tag for this line is: PD0MS00, since the cases tagged as PronType=Art are all determiners (articles), but they all replace nouns and PRON UD guidelines says: Pronouns are words that substitute for nouns or noun phrases. The correct forms should be:

PA0MS = PD0MS
PA0FS = PD0FS

Other invalid tags are described here and repeated below. We are currently discussing them.

TAGSET: Tag DR0MP: Invalid code 'R' for feature 'type'
TAGSET: Tag DR0FS: Invalid code 'R' for feature 'type'
TAGSET: Unknown category for tag 
TAGSET: No rule to get short version of tag ''.
TAGSET: Unknown category for tag 
TAGSET: No rule to get short version of tag ''.
TAGSET: Tag DR0MS: Invalid code 'R' for feature 'type'
TAGSET: Tag DR0FP: Invalid code 'R' for feature 'type'
TAGSET: Tag DR0MS: Invalid code 'R' for feature 'type'
TAGSET: Tag PA0MS: Invalid code 'A' for feature 'type'
TAGSET: Tag PA0FS: Invalid code 'A' for feature 'type'
TAGSET: Tag DR0FS: Invalid code 'R' for feature 'type'
TAGSET: Tag DR0MS: Invalid code 'R' for feature 'type'