UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
272 stars 247 forks source link

The VerbForm feature #214

Closed ftyers closed 7 years ago

ftyers commented 9 years ago

In Kazakh and the other Turkic languages (not sure about Turkish, will leave that to Çağrı to comment), we have a 5-way distinction in verb forms:[1]

1) Finite forms "I write the letter" 2) Verbal nouns "I like his writing", "I think that writing is fun" 3) Verbal adjectives "He is a book writing man" [a man who writes books] 4) Verbal adverbs "While writing this I had a peculiar thought" 5) Participles "I could have been writing that book"

Finite forms are easy, they get VerbForm=Fin. Verbal adverbs are also quite easy, we can tag them with VerbForm=Trans (although this is really unusual terminology for Turkic, the description in the documentation seems to fit). Verbal nouns seem to work fine with VerbForm=Ger.

The problem is with verbal adjectives and participles. There is a VerbForm=Part class which states that it is a "is a non-finite verb form that shares properties of verbs and adjectives". However in Kazakh, participles have more in common with verbal adverbs than with verbal adjectives, and verbal adjectives are used quite differently. For example to make relative clauses.

Given how they are used in other languages (e.g. Russian) I would like to make verbal adjective be VerbForm=Part and then have another value for participles. Note that making them all VerbForm=Trans would be a suboptimal solution as not all participles are ambiguous with verbal adverbs and vice versa. It would almost be possible to use VerbForm=Inf from the description "may be used together with auxiliaries to form periphrastic tenses", but again this would be really stretching terminology.

So, in brief my suggestion, to start the discussion would be:

Finite = VerbForm=Fin Verbal noun = VerbForm=Ger Verbal adjective = VerbForm=Part Verbal adverb = VerbForm=Trans Participle = VerbForm=Inf

Any thoughts?

  1. There is a nice description with examples on our Wiki here: http://wiki.apertium.org/wiki/Turkic_lexicon#Non-finite_verb_forms
coltekin commented 9 years ago

A few comments and some data from Turkish:

dan-zeman commented 9 years ago

While I have no strong opinion at the moment about what to do with the Turkic verb forms, let me contribute some examples from Czech and Russian (in a sense Czech was very influential on what you find in universal features because it was the first language covered by Interset; that's how the term of transgressive got there):

https://docs.google.com/spreadsheets/d/1p-pAsIxjFUkKfPpn1xvrsuuZ52Qvr1Z5p1NWlpsfXHc/edit?usp=sharing

The isolated values of VerbForm do not cover all related Czech words either (with passives there is even one distinction not present in Russian). But we classify some of the words as NOUN or ADJ (rather than VERB). In combination with VerbForm and Tense it is enough to distinguish all cases.

I think we need to find a balance between the two guidelines of UD: 1. annotate same things same way... 2. but not overdo it. As long as the current definitions match (or can be extended to) a Turkic verb form to a reasonable extent, we want to use one of the current feature values. But if it would be too unnatural, we want to add new values. Obviously, there is no exact way of measuring “reasonable extent” and “too unnatural”, but you (plural :-)) can probably judge that better than me.

spyysalo commented 7 years ago

Closing as there is no recent activity and the v2 guidelines are now being published. Please consider opening a new issue with reference to the new guidelines and this discussion if there are open questions relating to this issue.

jonorthwash commented 7 years ago

@spyysalo, does v2 address this issue? Otherwise, I don't think "no activity" constitutes a valid reason to close an issue—it simply means that the issue was not receiving attention from those who might able to address it.

dan-zeman commented 7 years ago

@jonorthwash : Agreed that "no activity" is not a good reason to close the issue. However, the v2 guidelines have reshaped the VerbForm landscape so I think if further discussion is needed, it would be better to open a fresh issue as @spyysalo suggests.

In a nutshell: verbal nouns should now be VerbForm=Vnoun. The value Ger still exists but its usage is discouraged in cases where an alternative is available. Transgressives are now renamed to converbs, VerbForm=Conv. There is no change in the definition of participles. What is sometimes called "adverbial participles" is still covered under the term converb, while the value Part is reserved for the remaining cases, which are supposed to be closer to adjectives. You guys should decide which of the two is better for particular word forms in Kazakh, and if possible, synchronize the outcome with the other Turkic languages.