UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
267 stars 246 forks source link

Ancient Greek Relation Subtypes #958

Open mr-martian opened 1 year ago

mr-martian commented 1 year ago

Currently, Ancient Greek has the following subtypes enabled:

advcl:cmp, advmod:emph, aux:pass, csubj:pass, flat:foreign, flat:name, nsubj:outer, nsubj:pass, obl:agent, obl:arg

In PTNK, I have additionally made use of the following:

   2647 nmod:poss
    468 acl:relcl
     91 obl:tmod
     32 obl:npmod
     12 cc:preconj
      2 cop:outer
      2 advcl:relcl

Should I document these or should I reduce some or all of them to the non-subtyped relation?

dan-zeman commented 1 year ago

@daghaug @gcelano

Stormur commented 1 year ago

If I can add my 2 cents, however coming from my experience with Latin (if harmonisation with it has some importance), I could comment:

nschneid commented 1 year ago
  • advcl:relcl: this absolutely needs a documentation defining it, because as of now we have two different relations using this label. In Latin it is for free relatives; in English it is for "sentence relatives", which Latin currently treats by means of advcl:pred.

English also uses advcl:relcl for free relatives where the WH word is an adverb, e.g. "I looked where you were sitting": advcl:relcl(where, sitting).

Stormur commented 1 year ago

So there are two concurring uses of advcl:relcl? And what for "non-adverbial" relative words?

mr-martian commented 1 year ago

So, how I'm currently using them:

nschneid commented 1 year ago

So there are two concurring uses of advcl:relcl? And what for "non-adverbial" relative words?

I've updated the docs to explain this more clearly: https://github.com/UniversalDependencies/docs/blob/pages-source/_en/dep/advcl-relcl.md

(The page on the site isn't updating for some reason)

amir-zeldes commented 1 year ago

obl:npmod: we are not using it and I sincerely do not think it makes sense

It is an odd label linguistically, to be sure, but if you want to use obl:tmod, then I think you will probably need obl:npmod as well. The tmod label is used for temporal noun phrases used adverbially, as in 1. When a similar phrase describes a non-temporal quantity, you need some kind of label, and that's what obl:npmod does:

  1. Let's meet next week/obl:tmod
  2. Let's meet the way/obl:npmod we planned originally

It has been pointed out that obl:tmod isn't really a syntactic category but more of a semantic subtype, so in a way obl:npmod subsumes it and I suppose it would basically cover accusativus graecus.

cc:preconj: I do not think this has any meaning at all since it is directly retrievable from the linear order of tokens

I think this is not 100% true, but realistically you are right that it is mostly predictable. Hypothetically you could get something like "I arrived and/cc then both/cc:preconj danced and/cc sang", where it's not totally obvious what would be cc:preconj. That said, even when it is trivial, it's sometimes nice to be able to easily find all cases that have a cc:preconj, and it's easy enough to do, so why not?

nmod:poss this is essentially "is the dependent Case=Gen or Poss=Yes?" which, yeah, not that helpful

That may be true, but it might still be nice for comparability to other languages which use nmod:poss.

Stormur commented 1 year ago

So, how I'm currently using them:

* `nmod:poss` this is essentially "is the dependent `Case=Gen` or `Poss=Yes`?" which, yeah, not that helpful

I thought of it in part as the difference between subjective/objective genitive (e.g., for the wider public, amor matris 'the love for the mother vs. the love from the mother', both expressed by the genitive), but then I am not sure we can label the subjective one as "possessive"; probably this pertains at some level of reference annotation? Given an nmod relation, the feature Poss=Yes should not change this picture.


* `obl:npmod` my starting point was word aligning with the Hebrew treebank (which uses this for the infinitive absolute) and projecting, so this relation is present in places where the Septuagint copies the Hebrew construction of reduplicating the verb 

But then, is it still related to Latin? :thinking:


obl:npmod: we are not using it and I sincerely do not think it makes sense

It is an odd label linguistically, to be sure, but if you want to use obl:tmod, then I think you will probably need obl:npmod as well. The tmod label is used for temporal noun phrases used adverbially, as in 1. When a similar phrase describes a non-temporal quantity, you need some kind of label, and that's what obl:npmod does:

1. Let's meet next week/obl:tmod

2. Let's meet the way/obl:npmod we planned originally

It has been pointed out that obl:tmod isn't really a syntactic category but more of a semantic subtype, so in a way obl:npmod subsumes it and I suppose it would basically cover accusativus graecus.

It is, as many others, and for this reason it appears only as a subtype. Many subtypes (most?) are semantic, even relcl is in some sense (there just happen to be a reference to something in the matrix clause).

We are using it "transversally", so it also appears for advmod.

I do not think that tmod and npmod are related exactly for this reason: with regard to "adverbiality", this is already subsumed under UD's use of he oblique obl relation; so tmod is purely semantic, or let's say lexical, in that it depends either on the word (e.g. semper 'always') or on the predicate (e.g. vivo 'to live' with some argument denoting an event). I am not sure why it should cover accusativus graecus if this is already covered by obl (in its current interpretation) and if the purely syntactical fact of not being introduced by an element like an adposition is self-evident: what I mean is that a simple treebank query directly retrieves such cases.

In the example

2. Let's meet the way/obl:npmod we planned originally

I do not see what it is adding. It is already obl, and the fact it appears as such without a preposition is probably lexically determined, so maybe it should be annotated at a token level. If np stays for noun phrase, it is stating the obvious, as an oblique is already intended to be one.


cc:preconj: I do not think this has any meaning at all since it is directly retrievable from the linear order of tokens

I think this is not 100% true, but realistically you are right that it is mostly predictable. Hypothetically you could get something like "I arrived and/cc then both/cc:preconj danced and/cc sang", where it's not totally obvious what would be cc:preconj. That said, even when it is trivial, it's sometimes nice to be able to easily find all cases that have a cc:preconj, and it's easy enough to do, so why not?

Hm... this might one further reason to tinker with UD's annotation of co-ordinations :thinking: I admit this still does not convince me totally about the usefulness of this subrelation instead of moot redundancy for a very functional relation...


nmod:poss this is essentially "is the dependent Case=Gen or Poss=Yes?" which, yeah, not that helpful

That may be true, but it might still be nice for comparability to other languages which use nmod:poss.

True, but then we need a clear definition which as of now does not seem to be there. There is probably also an overlap with det... or also just with the fact of a PronType=Prs depending as nmod?

Stormur commented 1 year ago

So there are two concurring uses of advcl:relcl? And what for "non-adverbial" relative words?

I've updated the docs to explain this more clearly: https://github.com/UniversalDependencies/docs/blob/pages-source/_en/dep/advcl-relcl.md

(The page on the site isn't updating for some reason)

I am now wondering if these are not or are indeed two different phenomena. I am sincerely confused.

... but is the subclause in I looked where you were sitting not rather an object of the main verb? I would instead think of somethong like Go back whence you came (correct?).

nschneid commented 1 year ago

An adverb can't be a direct object in UD, right? I think an obj has to be a nominal.

(I agree the location phrase is a complement/argument of "look" here, but that's not what UD cares about.)

amir-zeldes commented 1 year ago

It is already obl

Yes, obl:npmod and obl:tmod are subtypes of obl, so that part is natural. In many datasets, including English but also others such as Hebrew or Coptic, the plain obl is used specifically for prepositional phrases. I suspect it was originally a conversion remnant from Stanford Dependencies, which distinguished prep from npadvmod and tmod. These became the prototypes for nmod/obl, obl:npmod and obl:tmod.

Of course, the subtypes are totally optional, but that is the background for why all adverbial NPs (usually with some kind of spatiotemporal or extent semantics) have a subtype in languages that use them. So if you are using :tmod, I would also expect to see :npmod for non-temporal phrases. TBH if I were designing UD from scratch I would have just called such NPs advmod too, since that is essentially what accusativus graecus is, but advmod is prohibited on things not tagged ADV, so we have to use some kind of obl relation - the subtype is just to keep them separate from PP modifiers.

Stormur commented 1 year ago

An adverb can't be a direct object in UD, right? I think an obj has to be a nominal.

(I agree the location phrase is a complement/argument of "look" here, but that's not what UD cares about.)

I was perhaps confused by the fact that look is intransitive in English. But I missed the more important fact that where is "promoted" in the matrix clause. But if this is the case, I do not understand why, keeping advmod(look,where), you were sitting is not just acl:relcl as the "expansion" of where.

Probably I see where this is coming from: an ADV entails an advcl (propositional) and not an acl. But I do not know if this is not accepted by UD/the validator (and actually, this is one further case showing that where is not an "adverb", but a kind of pro-form). Still, another annotation strategy solving it would be to have where you were sitting as a whole as advcl:relcl of look, and then this use of advcl:relcl would be the same as for Latin. But I know the treebanks treat "free relatives" differently.


So if you are using :tmod, I would also expect to see :npmod for non-temporal phrases

Sorry if I am firm about this, but no. There is no logical relation. This all comes from some language-specific logics projected universally. Especially for Latin and Ancient Greek (and many other languages), there is nothing special about prepositionless arguments, as prepositions are just in alternation with Case.

I understand where this comes from, but I see (universally) more sense in a (hypothetic) semantic obl:manner for the way rather than a mechanical obl:npmod.

As for accusativus graecus, one might still envision an adv* annotation, but with advcl, maybe as advcl:pred (by the way, I personally think it is still left to be convincingly proven that accusativus graecus is really an adverbial rather than a second object... but this is another story). Anyway, the relation obl already means (or at least covers) something like "nominal adverbial": then the more meaningful subtype to be used here is arg, to keep track of a parallel complement/adjunct distinction, if this is what an advmod label would imply.

amir-zeldes commented 1 year ago

more sense in a (hypothetic) semantic obl:manner for the way rather than a mechanical obl:npmod.

Sure, that would be perfectly logical and seems fine to me. npmod is just the underspecified one (not saying if it's manner, or extent or something else). I don't much like the label either (no NPs in dependencies), it's just a legacy thing from SD.

As for accusativus graecus, one might still envision an adv* annotation, but with advcl, maybe as advcl:pred

Not if it's not a clause - then I would have expected (and wanted) advmod, but that is forbidden for nouns, and I lost that battle long ago ;)

there is nothing special about prepositionless arguments ... Anyway, the relation obl already means (or at least covers) something like "nominal adverbial"

Yes, that's all correct and UD takes that position explicitly in having obl be the main label for cases with and without prepositions. It's just that in some languages maintainers like to make that distinction, so they use subtypes - these are in no way mandatory. I think if you are using "tmod" also for adverbs and phrases with prepositions, and not using a subtype for other domains, it just ends up being different from how other languages use that subtype. But maybe that's OK - I was just pointing it out, since that subtype comes from UD English and is used differently there.

Stormur commented 1 year ago

But maybe that's OK - I was just pointing it out, since that subtype comes from UD English and is used differently there.

Hm, I have to look into it. But reading from the scant documentation, we seem to be in line. I do not see differences... it is simply independent from adpositions, even in English (judging from the examples in the documentation). tmod itself as a label might come from UD English, but "time complements" are universal...

We also use lmod. I fear other domains would be less defined and more problematic than these ones. Besides, I have not noticed attested relation subtypes for them, apart from subsubtypes of time and place.


As for accusativus graecus, one might still envision an adv* annotation, but with advcl, maybe as advcl:pred

Not if it's not a clause - then I would have expected (and wanted) advmod, but that is forbidden for nouns, and I lost that battle long ago ;)

It might be a nominal clause. But I agree that it would be a lectio difficillima (a 'very difficult interpretation'), not even truly justified. So, currently, obl still is the best (traditional) option.

amir-zeldes commented 1 year ago

it is simply independent from adpositions, even in English

I think that might be ambiguous - just to clarify, in UD English and related datasets following its practices, :tmod only occurs when there is no preposition

So, currently, obl still is the best (traditional) option.

Agreed!

mr-martian commented 9 months ago

I don't think this is actually resolved. I've been stripping subtypes from my treebank in the process of pushing to the UD repo, but I'd still like to actually include them.

dan-zeman commented 9 months ago

OK, but then it needs a new milestone. v2.13 is over.