UniversalConceptualCognitiveAnnotation / docs

UCCA Documentation
https://universalconceptualcognitiveannotation.github.io/
10 stars 1 forks source link

Rediscussing determiners #38

Closed dotdv closed 5 years ago

dotdv commented 6 years ago

Questions that came up in a conversation between Omri and me: should determiners (articles and demonstratives) be E or F? Also should they be included in scene units or not (if they are marked F it opens up the possibility of excluding them).

[The_F man_C]_A went to work or [The_E man_C]_A went to work

John saw [a_F great_D show_P]_A or John saw [[aE]\[P-] great_D [showC]\[-P]]_A

And should this include also demonstratives?: [This_F cake_C]_A is great or [This_E cake_C]_A is great

nschneid commented 6 years ago

One observation is that F currently applies to items that syntactically relate two things, right, like copulas, light verbs, tense auxiliaries, and infinitive TO? Determiners differ in simply modifying one thing.

Does F ever occur in non-scene units at present?

There is, however, considerable crosslinguistic variation in the use of articles, at least (I don't know about demonstratives). So by that standard it would make sense to distinguish them from contentful elaborators like adjectives, relative clauses, and non-participant PPs.

A compromise would be to add a category. :) M for functional modifier? The function word categories would then be:

Finally, I considered the possibility of merging determiners with Q because they provide number information in some cases in English. But they don't mark number in all languages, and I like restricting Q to expressions that are primarily conveying quantity/measurement rather than other things, like definiteness.

omriabnd commented 6 years ago

F is not necessarily meant to be used for elements that relate two entities. For instance, expletive "it", or cases of agreement that requires adding a word (e.g., determiners over adjectives of a definite noun in Hebrew). So we could use it here too. F can be used in all cases where a token does not evoke any relevant relation to the foundational layer and is not an argument of such.

The major reason I see for not including determiners as Fs is that in some languages they may be translated into non-determiners. So the question is whether the foundational layer should cover this kind of distinctions, and the answer should probably be no.

On Fri, Oct 5, 2018 at 3:30 PM Nathan Schneider notifications@github.com wrote:

One observation is that F currently applies to items that syntactically relate two things, right, like copulas, light verbs, tense auxiliaries, and infinitive TO? Determiners differ in simply modifying one thing.

Does F ever occur in non-scene units at present?

There is, however, considerable crosslinguistic variation in the use of articles, at least (I don't know about demonstratives). So by that standard it would make sense to distinguish them from contentful elaborators like adjectives, relative clauses, and non-participant PPs.

A compromise would be to add a category. :) M for functional modifier? The function word categories would then be:

  • F for clausal support/tense-aspect marking
  • R for adpositions/case/relativizers/participant subordinators
  • M for (non-clausal?) non-referential modifiers: articles (and demonstratives?). In general it could also include gender/noun class morphemes. This category would exclude possessive determiners, which refer to a separate entity than the modified noun.
    • Alternate terminology: I for identificational marker. (A function word that helps you figure out which entity the NP refers to, without itself referring to an entity.)
  • N for non-scene connectors (coordination)
  • L for parallel scene connectors (coordination/subordination)

Finally, I considered the possibility of merging determiners with Q because they provide number information in some cases in English. But they don't mark number in all languages, and I like restricting Q to expressions that are primarily conveying quantity/measurement rather than other things, like definiteness.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/UniversalConceptualCognitiveAnnotation/docs/issues/38#issuecomment-427348932, or mute the thread https://github.com/notifications/unsubscribe-auth/AIG866mht5fUSGR8b3jRQxAX-NnHupL-ks5uh1DagaJpZM4XJkRg .

nschneid commented 6 years ago

The major reason I see for not including determiners as Fs is that in some languages they may be translated into non-determiners.

Could you elaborate—what else do they get translated into?

Also, what about German articles, which bear gender, number, definiteness, and case? Do they warrant R because they reflect case?

omriabnd commented 5 years ago

On Sat, Oct 6, 2018 at 4:32 PM Nathan Schneider notifications@github.com wrote:

The major reason I see for not including determiners as Fs is that in some languages they may be translated into non-determiners.

Could you elaborate—what else do they get translated into?

For instance in Hebrew, "this" is translated into a pronominal post-nominal modifier (this dog <--> הכלב הזה).

Also, what about German articles, which bear gender, number, definiteness, and case? Do they warrant R because they reflect case?

Yes. F is a residual category. Basically it means no other category applies.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/UniversalConceptualCognitiveAnnotation/docs/issues/38#issuecomment-427574166, or mute the thread https://github.com/notifications/unsubscribe-auth/AIG868G9_9aYXbyLvxAjwi_S3RkBJcYjks5uiLDggaJpZM4XJkRg .

nschneid commented 5 years ago

For instance in Hebrew, "this" is translated into a pronominal post-nominal modifier (this dog <--> הכלב הזה).

Ah right. In the Hebrew UD treebank, at least, this is treated as adjectival modification.

Demonstratives do feel a bit more contentful than articles, though. Can articles be translated into non-determiners in some language (apart from the expression of case, which may translate to an adposition)?

omriabnd commented 5 years ago

On Mon, Oct 8, 2018 at 4:49 PM Nathan Schneider notifications@github.com wrote:

For instance in Hebrew, "this" is translated into a pronominal post-nominal modifier (this dog <--> הכלב הזה).

Ah right. In the Hebrew UD treebank, at least, this is treated as adjectival modification.

Demonstratives do feel a bit more contentful than articles, though. Can articles be translated into non-determiners in some language (apart from the expression of case, which may translate to an adposition)?

I can't think of a case in Hebrew, but WALS discusses the different realization of definiteness (https://wals.info/chapter/37), which could be part of the morphology for instance.

You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/UniversalConceptualCognitiveAnnotation/docs/issues/38#issuecomment-427840923, or mute the thread https://github.com/notifications/unsubscribe-auth/AIG86-LZzjsDz8z6ecwQtBi9-R5DxCzVks5ui1gAgaJpZM4XJkRg .

nschneid commented 5 years ago

Right—if definiteness is encoded morphologically in the noun in some languages, isn't that an argument that it's on the functional end of the spectrum?

I guess the trickiest case would be in languages with markers that are ambiguous between demonstratives and definite articles, like the Eastern Ojibwa example. Similarly, in German the word for "one" is ambiguous with the indefinite article (https://wals.info/chapter/38). UCCA already needs to make the latter distinction, right? So Q vs. E ambiguity would become Q vs. F ambiguity if articles are changed to F. If demonstratives are kept as E, then Ojibwa would have a similar ambiguity of E vs. F.

My sense is that treating caseless articles as F would be good in principle, but we'd have to decide whether it's worth the effort to change it—a lot of data/examples would be affected.

omriabnd commented 5 years ago

On Mon, Oct 8, 2018 at 5:42 PM Nathan Schneider notifications@github.com wrote:

Right—if definiteness is encoded morphologically in the noun in some languages, isn't that an argument that it's on the functional end of the spectrum?

Well, yes, but Function is only meant to be a residual category. If we had a layer that deals with definiteness, determiners won't be Fs.

I guess the trickiest case would be in languages with markers that are ambiguous between demonstratives and definite articles, like the Eastern Ojibwa example. Similarly, in German the word for "one" is ambiguous with the indefinite article (https://wals.info/chapter/38). UCCA already needs to make the latter distinction, right? So Q vs. E ambiguity would become Q vs. F ambiguity if articles are changed to F. If demonstratives are kept as E, then Ojibwa would have a similar ambiguity of E vs. F.

My sense is that treating caseless articles as F would be good in principle, but we'd have to decide whether it's worth the effort to change it—a lot of data/examples would be affected.

I agree, though since it's a closed class, we may be able to reliably detect it automatically.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/UniversalConceptualCognitiveAnnotation/docs/issues/38#issuecomment-427861887, or mute the thread https://github.com/notifications/unsubscribe-auth/AIG862vL3vCa5d4lV-hrwOh0X9gQAHfoks5ui2Q_gaJpZM4XJkRg .

nschneid commented 5 years ago

What would the policy be for את? On the one hand it encodes formal case, but on the other hand it's only used for definite NPs.

omriabnd commented 5 years ago

I would follow the convention and mark it as R (which is really a case marker)

On Mon, Oct 8, 2018 at 8:57 PM Nathan Schneider notifications@github.com wrote:

What would the policy be for את? On the one hand it encodes formal case, but on the other hand it's only used for definite NPs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/UniversalConceptualCognitiveAnnotation/docs/issues/38#issuecomment-427924999, or mute the thread https://github.com/notifications/unsubscribe-auth/AIG86wZErirUUwJ79dypBpHGUcJWZFYVks5ui5HtgaJpZM4XJkRg .

omriabnd commented 5 years ago

Resolution: Let's turn caseless articles into Fs

dotdv commented 5 years ago

Resolution: Let's turn caseless articles into Fs

Would we like to apply this already or wait with it? (I'm asking because I see that I've already started to apply this on examples I added) Also, if we do make this change, do we want to exclude articles from scene units from now on?("the_F show_P" instead of "[the_E show_C]_P"?

omriabnd commented 5 years ago

I think we can wait with this until we've finished with the guidelines. Re excluding articles: it doesn't make much difference, since we normalize the position of Fs anyway.