UniversalDependencies / UD_Portuguese-Bosque

This Universal Dependencies (UD) Portuguese treebank.
Other
50 stars 12 forks source link

PRON PronType=Art = Freeling tag? #25

Open livyreal opened 8 years ago

livyreal commented 8 years ago

This issue is a smaller part of the issue #21.

Interset conversion (UD -> Freeling tagset) proposal sets:

PRON PronType=Art|Number=Sing|Gender=Masc = PA0MS

However, the pos Pronoun (P) does not have a feature A (article) in Freeling (eagles tagset).

Occurences in [1] and [2]:

bosque_CF.udep.conll:6  o       o       PRON    DET_M_S_@SUBJ>  PronType=Art|Number=Sing|Gender=Masc    15      nsubj
bosque_CF.udep.conll:1  Os      o       PRON    DET_M_P_@SUBJ>  PronType=Art|Number=Plur|Gender=Masc    5       nsubj
bosque_CF.udep.conll:1  O       o       PRON    DET_M_S_@SUBJ>  PronType=Art|Number=Sing|Gender=Masc    9       nsubj
bosque_CF.udep.conll:5  os      o       PRON    DET_M_P_@SUBJ>  PronType=Art|Number=Plur|Gender=Masc    17      nsubj
bosque_CF.udep.conll:2  O       o       PRON    DET_M_S_@SUBJ>  PronType=Art|Number=Sing|Gender=Masc    8       nsubj
bosque_CF.udep.conll:1  O       o       PRON    DET_M_S_@SUBJ>  PronType=Art|Number=Sing|Gender=Masc    7       nsubj
bosque_CF.udep.conll:1  Os      o       PRON    DET_M_P_@SUBJ>  PronType=Art|Number=Plur|Gender=Masc    7       nsubj
bosque_CF.udep.conll:3  os      o       PRON    DET_M_P_@SUBJ>  PronType=Art|Number=Plur|Gender=Masc    8       nsubj
bosque_CF.udep.conll:3  o       o       PRON    DET_M_S_@<SC    PronType=Art|Number=Sing|Gender=Masc    0       xcomp
bosque_CF.udep.conll:3  o       o       PRON    DET_M_S_@<SC    PronType=Art|Number=Sing|Gender=Masc    0       root
bosque_CF.udep.conll:18 as      o       PRON    DET_F_P_@P<     PronType=Art|Number=Plur|Gender=Fem     8       conj
bosque_CF.udep.conll:13 o       o       PRON    DET_M_S_@P<     PronType=Art|Number=Sing|Gender=Masc    9       nmod
bosque_CP.udep.conll:26 o       o       PRON    INDP_M_S_@P<    PronType=Indp|PronType=Art|Number=Sing|Gender=Masc      15      nmod
bosque_CP.udep.conll:32 o       o       PRON    DET_M_S_@P<     PronType=Art|Number=Sing|Gender=Masc    30      det
bosque_CP.udep.conll:6  a       o       PRON    DET_F_S_@P<     PronType=Art|Number=Sing|Gender=Fem     4       nmod
bosque_CP.udep.conll:22 a       o       PRON    DET_F_S_@<PIV   PronType=Art|Number=Sing|Gender=Fem     21      dep
bosque_CP.udep.conll:45 o       o       PRON    DET_M_S_@SUBJ>  PronType=Art|Number=Sing|Gender=Masc    53      nsubj
bosque_CP.udep.conll:1  O       o       PRON    DET_M_S_@SUBJ>  PronType=Art|Number=Sing|Gender=Masc    8       nsubj
bosque_CP.udep.conll:1  O_que   o_que   PRON    DET_M_S_@ACC>   PronType=Art|Number=Sing|Gender=Masc    4       dobj
bosque_CP.udep.conll:19 a       o       PRON    INDP_F_S_@>N    PronType=Indp|PronType=Art|Number=Sing|Gender=Fem       20      dep
bosque_CP.udep.conll:1  Os      o       PRON    DET_M_P_@SUBJ>  PronType=Art|Number=Plur|Gender=Masc    19      nsubj
bosque_CP.udep.conll:7  os      o       PRON    DET_M_P_@<ACC   PronType=Art|Number=Plur|Gender=Masc    1       dobj

It looks like a good tag for it would be Determiner Article, however looking to the contexts where this tag appears, we realized it is more pronoun likely.

O que nos preocupa é o cumprimento da ordem judicial.
2    O    o    PRON    DET_M_S_@SUBJ>    PronType=Art|Number=Sing|Gender=Masc    8    nsubj        
3    que    que    PRON    INDP_M_S_@SUBJ>    PronType=Indp|PronType=Rel|Number=Sing|Gender=Masc    5    nsubj        
4    nos    nós    PRON    PERS_M/F_1P_ACC_@ACC>    Reflex=Yes|PronType=Prs|Case=Acc|Person=1|Number=Plur|Gender=None    5    dobj        
5    preocupa    preocupar    VERB    V_PR_3S_IND_@FS-N<    Mood=Ind|Tense=Pres|Person=3|Number=Sing    2    acl:relcl  

O que ficar de mim é a consciência do que fui e que os outros recordarão -- ou não.

1    O    o    PRON    DET_M_S_@SUBJ>    PronType=Art|Number=Sing|Gender=Masc    8    nsubj        
2    que    que    PRON    INDP_M_S_@SUBJ>    PronType=Indp|PronType=Rel|Number=Sing|Gender=Masc    3    nsubj        
3    ficar    ficar    VERB    V_FUT_3S_SUBJ_@FS-N<    Mood=Subj|Tense=Fut|Person=3|Number=Sing    1    acl:relcl        
4    de    de    ADP    PRP_@<SA    _    5    case        
5    mim    eu    PRON    PERS_M/F_1S_PIV_@P<    PronType=Prs|Person=1|Number=Sing|Gender=None    3    nmod    

Running this kind of sentence in Freeling Online demo, we have "o" as DA0MS0 or PD0MS00. Those possibilities are also in Freeling PT dictionary.

Considering UD guidelines to pronouns that says: pronouns substitutes nouns or noun phrases

I would prefer use the tag PD0MS00 for the UD line PRON PronType=Art|Number=Sing|Gender=Masc

@claudiafreitas agreed by e-mail thinking of the following examples:

O que fizeram com o Parreira --> Aquilo que fizeram com o Parreira O que ficar de mim é a ... --> Aquilo que ficar de mim...

"aquilo" is a demonstrative pron.

For reference:


[1] Bosque 7.5 Universal dependencies, file bosque_CP.udep.conll.gz,
[2] Bosque 7.5 Universal dependencies, file bosque_CF.udep.conll.gz
[3] Bosque version 7.3, converted by Dan Zeman available in 
http://github.com/UniversalDependencies/UD_Portuguese
[4] Linguateca Version of Bosque CoNLL (7.3), http://www.linguateca.pt/floresta/CoNLL-X/
livyreal commented 8 years ago

reported: https://github.com/dan-zeman/interset/issues/4