UniversalDependencies / UD_Portuguese-Bosque

This Universal Dependencies (UD) Portuguese treebank.
Other
48 stars 11 forks source link

participles as adjectives vs. verb #53

Open livyreal opened 7 years ago

livyreal commented 7 years ago

It seems all participles are tagged as verbs, even in contexts where they work as adjectives. Spanish documentation says:

The class of adjectives in Spanish UD also includes ordinal numbers and participial adjectives, both behaving as adjectives morphologically and syntactically. Note that participles are word forms that may share properties and usage of adjectives and verbs. Depending on context, they may be classified as either VERB or ADJ.

Maybe @claudiafreitas and @luizafrizzo can help with this.

Few examples:

16  com com ADP PRP|@N<PRED _   20  case    _   _
17  o   o   DET <artd>|ART|M|S|@>N  Gender=Masc|Number=Sing 20  det _   _
18  seu seu DET <poss>|<si>|DET|M|S|@>N Gender=Masc|Number=Sing 20  det _   _
19  reduzido    reduzir VERB    <mv>|V|PCP|M|S|@ICL-N<  Gender=Masc|Number=Sing 20  acl _   _
20  grupo   grupo   NOUN    <np-idf>|N|M|S|@P<  Gender=Masc|Number=Sing 9   nmod    _   _
21  de  de  ADP PRP|@N< _   22  case    _   _
22  homens  homem   NOUN    <np-idf>|N|M|P|@P<  Gender=Masc|Number=Plur 20  nmod    _   _

1   Uma um  DET <arti>|ART|F|S|@>N  Gender=Fem|Number=Sing  2   det _   _
2   situação  situação  NOUN    <np-idf>|N|F|S|@NPHR    Gender=Fem|Number=Sing  0   root    _   _
3   complicada  complicar   VERB    <mv>|V|PCP|F|S|@ICL-N<  Gender=Fem|Number=Sing  2   acl _   _
4   porque  porque  SCONJ   KS|@SUB _   8   mark    _   _
5   Hélder Hélder PROPN   PROP|M|S|@SUBJ> Gender=Masc|Number=Sing 8   nsubj   _   _
vcvpaiva commented 7 years ago

Yes, I agree. it seems to me that for this issue we should NOT follow the Spanish corpus. there are lots of discussions in the UD forum and Luiza had already produced some guidelines, if I remember correctly.

luizafrizzo commented 7 years ago

Hello there. The participles are all tagged as verbs because that was “inherited” from the original Bosque corpus. Bosque's standard is to tag all participles as V, specifying that they are V-PCP, regardless if they are clearly adjectives. The only exceptions are cases where the participle clearly acts as a noun, then they are tagged as such.

However, the general UD guidelines state the following: 

Note that participles are word forms that may share properties and usage of adjectives and verbs. Depending on language and context, they may be classified as either VERB or ADJ.

To annotate the other Portuguese corpus, the Portuguese-BR UD treebank, it seemed to us (but this information was not stated anywhere) that the following criteria was used:

  1. Participles in passive voice sentences without auxiliaries were tagged as V (ex: “Em 2007, o STF aceitou denúncia contra os 40 suspeitos de envolvimento no suposto esquema denunciado_V em 2005 pelo então deputado federal Roberto Jefferson”)
  2. Participles that modify nouns in sentences that were not in the passive voice were tagged as ADJ (ex: “Com Jadson muito apagado_ADJ e Denilson e Casemiro ineficientes na saída de bola”)
  3. Participles preceded by the auxiliaries ficar/estar were tagged as ADJ (ex: “No entanto, como ressaltei, essas iniciativas não estavam articuladas_ADJ […]”)
  4. Participles preceded by the auxiliaries ter/ser were tagged as V (ex: “A carga não foi prejudicada_V”)

When we converted the Mac-Morpho corpus, we took into consideration these “rules” and also developed a few more to help us with the participles. We used the following criteria:

  1. Participles preceded by the auxiliaries ter/ser/haver were tagged as V (ex: “Os dois carros são vendidos_PCP com ágio, em o mercado paralelo”)
  2. Participles preceded by the auxiliaries ficar/estar were tagged as ADJ (ex: “O viaduto ficou completamente interditado_PCP até_PREP a as 9h10.”)
  3. Participles in passive voice sentences with an explicit agent were tagged as V (ex: “Segundo pesquisa realizada_PCP por o Datafolha em o  último dia 25 de julho, Serra tem 30% de as  intenções de voto, contra 27% de Erundina.”)
  4. Participles in passive voice sentences without auxiliaries were tagged as V (ex: “O caderno especial sobre os 10 anos de a derrota de a emenda que restabelecia eleições diretas, publicado_PCP em o domingo passado, conseguiu opiniões unânimes de os leitores”.) 
  5. Participles used in “conventional” word combinations were tagged as ADJ (como em, “semana passada”, “países desenvolvidos”, “revendedora autorizada”, “revistas especializadas”, dentre muitos outros). 
  6. Participles that modify nouns in sentences that were not in the passive voice were tagged as ADJ (ex: “Além de devoto, Ricupero é um disciplinado_PCP estudioso_N de religião.”)

When even with the rules it was hard to classify the participles (sometimes it can be hard to tell if a sentence is a iv or vi for exemple), we applied the following rules:

  1. Participles that satisfied (most of) Pimenta-Bueno's (1986) rules to identify adjectives would be tagged as ADJ. 
  2. Participles that did not satisfied (most of) Pimenta-Bueno's (1986) rules to identify adjectives would be tagged as V. 

The Pimenta-Bueno’s rules state that all these will be ADJ:

vcvpaiva commented 7 years ago

Many thanks for the three (!!!) sets of guidelines @luizafrizzo! I am doing some rewriting on your message, if I may. I think your (and @claudiafreitas) 8 rules should have numbers, as we're likely to need to refer to them quite often in what follows. Pimenta-Bueno's rules can have roman numerals as I expect they are here mostly for tradition. adding some letters to McDonald's corpus "rules", which I expect were learned from corpus, instead of decided in advance.

Now please correct me, if I misremember, but your rules and Pimenta-Bueno's were kind of consistent, you had more cases of verbs? ("livro amassado por Fulano") P-B said was adjective and your rules said verb, right? Now I hope the 4 rules of MacDonald's are ok for you?

vcvpaiva commented 7 years ago

@claudiafreitas did the work over the weekend produce a summary or number of mistagged participles according to your and @luizafrizzo 's sets of rules?

claudiafreitas commented 7 years ago

@arademaker ficou de implementar isso, não foi?

fcbr commented 7 years ago

We began working on converting CONLL files to triples so we can query them via SPARQL.

Here's one sample query: list me all the sentences that have an auxiliary verb followed by a participle.

http://wnpt.brlcloud.com:10035/repositories/bosque#query/r/aux-seguido-pcp

arademaker commented 7 years ago

E ainda podemos listar, se houve, a relação de dependência entre os dois tokens:

https://goo.gl/W4dt2p

fcbr commented 7 years ago

After we fixed #91, is there anything else to be done in this issue? Or can we close it?

livyreal commented 7 years ago

141 tem q ser fechado antes