PhyloStar / UDTelugu

Universal Dependency Tagging for Telugu
Apache License 2.0
1 stars 1 forks source link

About splitting and not splitting #2

Closed nishkalavallabhi closed 7 years ago

nishkalavallabhi commented 7 years ago

Now, a day since the last discussion, I am wondering if we should split the words if they are not split in the text.

e.g., waaLLu maMciwaaLLu . - waaLLu is tagged as a Pronoun and maMciwaaLLu is tagged as a noun, and it is a single word. If we see a sentence: waaLLu maMci waaLLu - perhaps we need to tag it as: Pronoun Adjective Pronoun. What do you think?

PhyloStar commented 7 years ago

@cagri tells that it is better to have whatever is there in the grammar. I noticed that there are both instances of kanTe like "naakanTe" and "naa kanTe" in the grammar. May be both are necessary since the parser can learn both variations.

regarding maMciwaaLLu It has to be changed to Pronoun.

nishkalavallabhi commented 7 years ago

Why should maMchiwaaLLu be a pronoun?

PhyloStar commented 7 years ago

Lexically waaLLu is pronoun. In this example, it is a collective noun for people. I think this has to do with splitting the words.

nishkalavallabhi commented 7 years ago

I also noticed something: 8.8 first sentence: mee oor(u) emiti? The bracket is perhaps because the telugu version of the sentence has ooremiti as a single word, not two words. Should we take this as one word or two words, then?