Open livyreal opened 7 years ago
Yes, I agree. it seems to me that for this issue we should NOT follow the Spanish corpus. there are lots of discussions in the UD forum and Luiza had already produced some guidelines, if I remember correctly.
Hello there. The participles are all tagged as verbs because that was “inherited” from the original Bosque corpus. Bosque's standard is to tag all participles as V, specifying that they are V-PCP, regardless if they are clearly adjectives. The only exceptions are cases where the participle clearly acts as a noun, then they are tagged as such.
However, the general UD guidelines state the following:
Note that participles are word forms that may share properties and usage of adjectives and verbs. Depending on language and context, they may be classified as either VERB or ADJ.
To annotate the other Portuguese corpus, the Portuguese-BR UD treebank, it seemed to us (but this information was not stated anywhere) that the following criteria was used:
When we converted the Mac-Morpho corpus, we took into consideration these “rules” and also developed a few more to help us with the participles. We used the following criteria:
When even with the rules it was hard to classify the participles (sometimes it can be hard to tell if a sentence is a iv or vi for exemple), we applied the following rules:
The Pimenta-Bueno’s rules state that all these will be ADJ:
Many thanks for the three (!!!) sets of guidelines @luizafrizzo! I am doing some rewriting on your message, if I may. I think your (and @claudiafreitas) 8 rules should have numbers, as we're likely to need to refer to them quite often in what follows. Pimenta-Bueno's rules can have roman numerals as I expect they are here mostly for tradition. adding some letters to McDonald's corpus "rules", which I expect were learned from corpus, instead of decided in advance.
Now please correct me, if I misremember, but your rules and Pimenta-Bueno's were kind of consistent, you had more cases of verbs? ("livro amassado por Fulano") P-B said was adjective and your rules said verb, right? Now I hope the 4 rules of MacDonald's are ok for you?
@claudiafreitas did the work over the weekend produce a summary or number of mistagged participles according to your and @luizafrizzo 's sets of rules?
@arademaker ficou de implementar isso, não foi?
We began working on converting CONLL files to triples so we can query them via SPARQL.
Here's one sample query: list me all the sentences that have an auxiliary verb followed by a participle.
http://wnpt.brlcloud.com:10035/repositories/bosque#query/r/aux-seguido-pcp
E ainda podemos listar, se houve, a relação de dependência entre os dois tokens:
After we fixed #91, is there anything else to be done in this issue? Or can we close it?
It seems all participles are tagged as verbs, even in contexts where they work as adjectives. Spanish documentation says:
Maybe @claudiafreitas and @luizafrizzo can help with this.
Few examples: