Validation Error: 'cop' not expected to have children inappropriate for Old Irish

UniversalDependencies / docs

Universal Dependencies online documentation

http://universaldependencies.org/

Apache License 2.0

267 stars 245 forks source link

Validation Error: 'cop' not expected to have children inappropriate for Old Irish #928

Closed AdeDoyle closed 1 year ago

AdeDoyle commented 1 year ago

I've gotten the following error when trying to validate an upcoming Old Irish treebank:

[L3 Syntax leaf-aux-cop] 'cop' not expected to have children (4:nda:cop --> 3:no:compound)

I know the copula is restricted in many European languages, particularly in modern ones, however, the Old Irish copula is very complex. It inflects for person, number, tense, voice, and more. It has specific "conjunct" forms, used only in close compounds with preceding words.

One of these enclitic copula forms is causing the validation error in the sentence amal nondafrecṅdirccsa "for that I am present". Here a relative construction is created by preceding the conjunct copula form, da "I am", with the semantically empty verbal particle no. The purpose of this is that it allows the nasal n to be infixed between no and da, which gives the copula relative force nonda "that I am".

This semantically empty particle, no, can also be compounded with verbs in much the same way as meaningful verbal particles can, and so the dependency relation used to attach any such particle to a verb, compound:prt is also used to attach it to a copula. But copulas are not expected to have children. Can this be changed for Old Irish?

martinpopel commented 1 year ago

Technically the rule in the validator allows several types of children deprels as exceptions, for example fixed. If "no nda" is a multiword copula phrase, 4:nda:cop --> 3:no:compound could be changed into 3:no:cop --> 4:nda:fixed. Without any knowledge of Old Irish I am not suggesting this is a good solution. On the contrary - the downside of this solution is that "no" would have to be listed in the list of Old Irish copula lemmas because the validator does not have any support for multiword copulas (yet).

AdeDoyle commented 1 year ago

The difficulty with fixed (or goeswith, which is also supported by the validator) is that the head has to be the first part. That requires that the copula be dependent on no, instead of the other way around. That seems inappropriate given that no has no semantic meaning, and is only there to support the copula. I don't think any grammatical resource for Old Irish would accept that no could be a lemma for the copula.

Another factor to consider is that no is not the only thing which can combine with the copula. Other particles (robad "it might be", indid "is it", diandat "to which they are"), prepositions (indid "in which it is", condid "until it is"), conjunctions (cesu "though it is", másu "if it is", conid "that it is"), and personal pronouns (issum écen "it is necessary for me", níb écen "it is not necessary for ye", etc.) can also. Because Old Irish orthography predates modern spelling standardisation, and has three distinct classes of "infixed pronouns", this would mean that tens of word forms (if not over a hundred) would have to be listed in the list of Old Irish copula lemmas, not just no. The list would have to represent any possible variant spelling of any POS which can combine with any form of the copula. For example, cesu, cisu ciasu, and cíasu represent four variant forms of the just one conjunction, all of which would need to be accounted for, even if attested only once.

I think the most appropriate solution might be to alter the validator to allow for multiword copulas, even if just for Old Irish. Is that possible?

nschneid commented 1 year ago

fixed, flat, and goeswith should be understood as headless relations. There is a technical head in the data format (the first word) but linguistically there is no asymmetry asserted between head and modifier.

If you think the copula is linguistically the head, then since UD understands copulas as functional support for other predicates, presumably that predicate should be annotated as the head of no as well.

An analogy might be made to the English contraction didn't: even though n't seems to modify the auxiliary, since auxiliaries are not allowed to be heads we say in UD that both are dependents of the main predicate.

(There are plenty of linguistic arguments for function words being heads in certain languages, but if you want that you should use a different framework, like SUD.)

AdeDoyle commented 1 year ago

I take your point that UD considers the copula to be functional support for a predicate, however, the fact that the Old Irish copula inflects for person and number means that it acts both as copula and subject, and that subject could reasonably be considered the head of other words and particles which provide functional support. As such, the analogy with English didn't doesn't really work because did is only an auxiliary, not the auxiliary and subject in one. Where particles like no compound with inflected copula forms, it's reasonable from a UD standpoint to say that their head should be the predicate, not the copula, however, this doesn't account for the reality of the copula being both auxiliary and subject.

Consider the example am rí na hÉrenn "I_am king of Ireland". The copula, am, is inflected to express "I am", and so the subject and the copula are the same word. The noun, rí "king", is the predicate. The same meaning can also be expressed another way, using the 3rd sg. form of the copula with a discrete pronoun to highlight the predicate, is meisse rí na hÉrenn (lit. "it_is I king of Ireland"). Here the subject, strictly speaking, is still inflected within the copula, and the predicate is the pronoun, meisse. However, the noun, rí corresponds to subject which is built into the copula, and is generally interpreted as being the subject itself. So, the copula can fluctuate between acting more or less like a typical auxiliary in Old Irish.

The Modern Irish equivalent expression would be is mise rí na hÉireann "I am king of Ireland". Clearly the second of the two Old Irish examples was the template for this development, however, it is not perfectly analogous. Because the modern copula does not inflect, here the copula, is, can be interpreted simply as "am" while the Old Irish is meant "it is". So, in Modern Irish the pronoun, mise, is still the predicate, but the noun rí is the only subject, and is distinct from the copula.

This demonstrates that Old Irish copula was in something of a transitional stage, on its way to becoming what UD would consider an auxiliary in a modern sense. Nevertheless, it would be anachronistic to suggest it is anything other than the copula just because it does not fit the UD template for what an auxiliary should be.

nschneid commented 1 year ago

In many languages, verbs can appear without an overt pronominal subject, which is understood from inflectional information on the verb (pro-drop). Is this what you mean by a combination of copula and subject? I'm not aware of a principle that says a pro-drop copula is treated differently in UD from non-pro-drop copulas in other languages. Unless the subject could, for example, consist of a pronoun with a prepositional phrase (along the lines of 'he with the sword is the king'), and still be merged with the copula.

I am not sure how to tell whether meisse or rí ought to be the subject in UD terms. Taken literally, in English, "It is I, king of Ireland", I would treat that as a case of apposition (appos) between "I" and "king". But if it is grammaticalized in Old Irish, it might be a case of clitic doubling.

AdeDoyle commented 1 year ago

My understanding of pro-dropping is limited, as it's rarely discussed in an Irish context, but my impression is that the term pro-dropping is generally used only in contexts where a pronominal subject is possible but also unnecessary because of inflectional information provided by a verb. In a modern Irish context, where verbs inflect to show subject it is grammatically incorrect to also use a pronominal subject.

The relationship between verbal morphology and pronouns in Old Irish is a bit more complicated, to say the least, but the short of it is that independent personal pronouns cannot ever be used with a verb other than the copula. Discrete emphatic pronouns do occur with regular verbs, but they affect the sentence semantically, while pro-dropping only removes redundant pronouns.

Independent pronouns in Old Irish, they are only used in the kind of construction I mentioned above, where the pronoun forms the topic of the sentence and, therefore, is fronted. Usually this makes them the predicate of a copula, but they can also be fronted by, for example, an interrogative particle, in tusu rí na hÉrenn? "Are you king of Ireland?", or a vocative particle, a thusu "You!"

As for the Old Irish copula, syntax is crucial, whatever follows the copula immediately is the predicate. For this reason, if it is sufficient to refer to the subject only by person and number, this information is contained in the inflected copula form, so no explicit subject needs to be stated, am rí "I_am a king", at rí "you_are a king", is rí "he/it_is a king", etc. If, for whatever reason, it is necessary to refer to a specific subject this is placed after the predicate and a 3rd person form of the copula is used, is rí tusu "you are a king" (lit. "it_is a king you"), is rí dauid (lit. "it_is a king David"), it insi ériu ocus albu "Ireland and Britain are islands" (lit. "they_are islands Ireland and Britain"). In a sense it's more like noun-dropping than pro-dropping because the subject that is dropped is more usually a noun or proper noun than a pronoun. Regardless, the different constructions serve different semantic purposes (if only slight), so dropping the explicit subject changes the meaning somewhat. As such, it's removal is not arbitrary in the way that pronouns lost to pro-dropping is.

I, myself, have given a lot of thought to whether rí ought to be the subject in UD terms in constructions like is meisse rí na hÉrenn, though certainly, meisse, is the predicate there. The choice for subject is between rí and is, the copula itself. If rí is the subject, then the copula acts normally in this construction, but it still inflects in the other. Alternatively, if rí is not the subject itself, but some recapitulation or specification of the subject, then what exactly is it? I think it's probably incorrect to read a comma into my literal translation, "it is I king of Ireland", as you have, as this would suggest apposition where really this is better described as a single clause containing a subject which has already been pointed to by the copula inflection.

I think it's possibly closer to clitic doubling, though, I believe this phenomenon only occurs in verb phrases, and I don't know of it ever affecting even a substantive verb, never mind a copula. Even allowing for the possibility that the copula could act as a stand-in for another verb to support this construction, however, I believe clitic doubling generally involves a discrete pronominal token, the clitic, separate from the verb in question. It is not possible to separate such a pronominal clitic from the Old Irish copula. The copula is itself a clitic, it cannot be reduced any further.

It's my opinion this should be seen as a developmental stage of the copula, falling somewhere between what may historically have been a proper verb, and the modern Irish copula which acts more like a normal auxiliary. It may have characteristics of both, but is clearly a fledgling copula, distinct from other verbs in usage and directly traceable to the modern form. The fact that it can stand alone as both copula and subject, even if only in particular cases, I think makes it an exception to the rule that the copula can't be the linguistic head.

nschneid commented 1 year ago

I'm curious how this relates to other analyses of Irish. Any thoughts @kscanne @tlynn747 @eihe?

dan-zeman commented 1 year ago

It is quite common in various languages that the verb inflects for person and number, but I do not see why it should mean that the verb should also be seen as the subject. If such a language has a verbal copula, it is natural that the copula also inflects for person and number to cross-reference the subject.

To return to the original topic of this thread: In amal nondafrecṅdirccsa "for that I am present", the particle non is not separated from da by a space. It seems odd to first separate it as a self-standing syntactic word, only to attach it back to da via the compound relation. To me, the need to use a relation like compound is a strong indicator that nonda should stay as one syntactic word. There might be indicators of the opposite though. The example does not show it but if it is possible to write no as a separate orthographic word (taking into account that orthography was not standardized in the times of Old Irish), then we may either want to make no a syntactic word, as I understand it is done now, and have the attachment issue again, or define an exceptional word with space "no da" for Old Irish.

AdeDoyle commented 1 year ago

Spacing isn't a great way to determine word boundaries in Old Irish. Word-separation by spacing was a relatively new invention by the time Old Irish was being written down, and several syntactic words could be compounded together around a single phonetic stress. Much of my own work to date has actually been focussed on determining a systematic tokenisation standard which is appropriate for Old Irish because of the difficulties this can cause. As you'll see in the other issue I currently have open, it was also possible in rare instances for certain elements which are typically considered part of a word can be separated from it by spacing.

As regards no not being spaced apart from the copula, there are counter examples for this also. I've included a picture from the St. Gall glosses manuscript here where no bed is written over two lines.

no1

Setting aside spacing, there are some more convincing indicators that analysing nonda as a single word would be inappropriate. Firstly, no is a particle that can be used with a range of other verbs, not just the copula. In such constructions it is common for pronouns to fall between no and the verb. no can also be used with other copula forms, such as bed (see again the picture above). no is a very productive particle in Old Irish, used in a wide range of situations, so it would be strange to force it into being a part of the copula rather than allowing it to be a function word supporting the inflection of the copula (whether we term it the subject or not). Secondly in the construction nonda, the particle is just no, not non. The copula is just da. The second n you see in the middle is nasalisation, a form of what are known in Celtic languages as Initial Consonant Mutations (ICMs). These ICMs affect the anlaut of a word, and only occur at the beginning of words. There are very few exceptions to this (nominal compounds which have become single words in their own right might contain a fossilised ICM in the middle, between what historically used to be two discrete nouns). So, the fact that da is affected by nasalisation strongly suggests that it was felt as a separate word from no.

The reasoning for using the compound relation is twofold. Firstly, unlike bed, da is a "conjunct" form of the copula, a clitic form used in combinations. A rough analogy might be made to forms like "n't" in "isn't". Secondly, the compound relation is already used where this same particle no is compounded with verbs. The alternative would be to treat infixed object pronouns as part of the verbal morphology, which I've also considered and experimented with, but it's utterly chaotic in terms of tagging morphological features, and very far removed from how the grammar is typically understood. I think at that stage I would just be shoehorning the language into the mould.

EDIT: Editing to say, treating infixed pronouns as verbal morphology, as I suggested above, wouldn't actually provide a solution from a UD standpoint either. Though relatively rare, instances occur where parts-of-speech other than object pronouns occur between preverbs and the remainder of the verb. These can include nouns and adjectives. Such instances would be unworkable if trying to treat everything between the preverb and the verb as just verbal morphology. For example, nomchoimmdiu cóima “may the lord help me” is a combination of no (PART) + m (PRON) + choimmdiu (NOUN) + cóima (VERB). If treating everything between the preverb and the end of the verb as a single token, this whole string would need to be treated as a single verb token, including the "infixed" noun and space character.

dan-zeman commented 1 year ago

Firstly, no is a particle that can be used with a range of other verbs, not just the copula.

That by itself would not convince me – other languages have prefixes that can be used with a range of verbs, so one might think that no- is also a prefix rather than a particle; but...

In such constructions it is common for pronouns to fall between no and the verb.

... this would be much more convincing, especially if these are pronouns that also occur elsewhere (in order to exclude the possibility that the "pronoun" is simply a morpheme of the verb that signals agreement).

So, the fact that da is affected by nasalisation strongly suggests that it was felt as a separate word from no.

OK, sounds good. (Although, to play a devil's advocate here :-): Could someone claim that the real rule for mutations is that they occur at the beginning of the word, unless the word has a prefix, in which case the mutation occurs after the prefix?)

If treating everything between the preverb and the end of the verb as a single token, this whole string would need to be treated as a single verb token, including the "infixed" noun and space character.

Hmm, agreed. If the previous arguments were not enough in themselves, with this one they definitely are. Insisting that no is still part of the verb would mean that there is incorporation like in polysynthetic languages, which would be very strange in the IE family, and also it would induce more problems than it would solve, as UD is not particularly known for being incorporation-friendly.

AdeDoyle commented 1 year ago

... this would be much more convincing, especially if these are pronouns that also occur elsewhere (in order to exclude the possibility that the "pronoun" is simply a morpheme of the verb that signals agreement).

These pronouns, are generally referred to as "infixed" pronouns, as they are most commonly used within the verbal complex between preverbs and the remainder of verbs, however, they can also follow negative particles, interrogative particles, other particles like the augment (albeit infrequently), conjunctions, and the copula.

OK, sounds good. (Although, to play a devil's advocate here :-): Could someone claim that the real rule for mutations is that they occur at the beginning of the word, unless the word has a prefix, in which case the mutation occurs after the prefix?)

It could certainly be argued, however, that's a bit of a quagmire. The pre-standard orthography and word separation of Old Irish manuscripts makes it very difficult to distinguish between a prefix and a clitic in that case. Some things are designated prefixes in the grammars and learning material, but not preverbs and particles like no, so an argument would rather need to be made that no is a prefix. Typically, in modern languages, the decision can down to whether or not space is tolerated between the proposed prefix and a following morpheme in a writing system. In Old Irish manuscripts, I've already shown that no can in some cases be followed by space or even a new line, on the other hand, Old Irish word separation tends to combine clitics with following words, as unstressed words typically combine around a single stressed word. The term "prefix", therefore, is not a great measure for "wordiness" in the language, and no is generally referred to as a "conjunct particle" in learning material, rather than a prefix.

Hmm, agreed. If the previous arguments were not enough in themselves, with this one they definitely are. Insisting that no is still part of the verb would mean that there is incorporation like in polysynthetic languages, which would be very strange in the IE family, and also it would induce more problems than it would solve, as UD is not particularly known for being incorporation-friendly.

It would be remiss of me not to acknowledge that there has been a suggestion to this effect regarding Old Irish based on a limited number of cases, none of which involve the "infixing" of nouns, etc. between no and following verbs which we are discussing here; 'Professor Borgstrøm has proposed a new approach to Gaelic word-boundaries on the basis of which there would be "some thirteen cases, each characterized by a prefix" and the language "would have to be classified as a (mildly) polysynthetic language"' (Ahlqvist, Anders [1974]. Notes on 'Case' and Word-Boundaries Ériu, Vol. 25, pp. 181-189.). This, however, is certainly not widely accepted, and as you say, it is difficult to defend in the context of IE languages at any rate. Most importantly for the discussion here, Ahlqvist's discussion of this position is based on "the idea of regarding prepositions as case prefixes". As such, it does not account for no, as it is not a preposition, nor can it account for the infixing of nouns, adjectives, etc. following no and other preverbs.

ftyers commented 1 year ago

@AdeDoyle could you give the full tree for amal nondafrecṅdirccsa as you have it at the moment ?

And what is the argument against having no as a dependent of the predicate in this sentence?

Could you also give a couple of other examples, one where the predicate is a simple NP and one where it is subordinate?

AdeDoyle commented 1 year ago

@AdeDoyle could you give the full tree for amal nondafrecṅdirccsa as you have it at the moment ?

Certainly, here's the current tree:

# sent_id = 8
# reference = 9b4
# text = .i. amal nondafrecṅdirccsa
1   .i. .i. ADV _   Abbr=Yes    5   advmod  _   _
2   amal    amal    SCONJ   _   _   5   mark    _   _
3   no  no  PART    _   PartType=Vb 5   compound:prt    _   SpaceAfter=No
4   nda is  AUX _   Mood=Ind|Number=Sing|Person=1|Polarity=Pos|Tense=Pres|VerbType=Cop  5   cop _   SpaceAfter=No
5   frecṅdircc  frecndairc  ADJ _   Case=Dat|Degree=Pos|Number=Sing 0   root    _   SpaceAfter=No
6   sa  sa  PRON    _   PronType=Emp    5   amod    _   _

There's actually another token in this tree, aside from no, which should be dependent on the copula, in my opinion. You'll note that sa is currently dependent on the predicate by necessity. sa is an emphatic particle. These are used to emphasise a previously stated person or persons. They are tagged PRON in UD because they inflect to compliment whatever person and number they are emphasising, but it should be noted they do not function like pronouns in the language, and therefore cannot be considered an explicit subject. In this case, sa refers to the first person singular subject which is represented by the form of the copula. As such, it is incorrect that either no or sa should be dependent on the predicate.

And what is the argument against having no as a dependent of the predicate in this sentence?

The best argument against no being a dependent of the predicate is that it actually has nothing to do with it. It is a semantically empty particle used to support verbs and the copula. In this case, it facilitates the creation of a relative clause (which is required after amal), amal nondafrecṅdirccsa "because I am present" (lit. "for that I am present"). To demonstrate this, consider a shortened version of this example, amal nonda "for that I am". If the particle, no, is to be tied to anything in this example it has to be the copula itself, because there is no predicate and neither is there any explicit subject. The inflection of the copula itself is the only thing which hints at the subject.

Could you also give a couple of other examples, one where the predicate is a simple NP and one where it is subordinate?

I assume you mean examples of no followed by the copula, as opposed to just no in general usage? This request is a bit difficult to facilitate given the limited quantity of writing which has survived from the Old Irish period. As you'll understand, attested forms are relatively limited in the surviving corpus, however, I have been able to find the following examples:

cenotad maicsi raith "though you are sons of grace" = ce (SCONJ) + no (PART) + tad (AUX) + maic (NOUN) + si (PRON) + raith (NOUN).
cenutad suír "though you are free" = ce (SCONJ) + nu (PART) + tad (AUX) + suír (ADJ)
cenudedissidi "though you are knowledgeable" = ce (SCONJ) + nu (PART) + ded (AUX) + issidi (NOUN)

amir-zeldes commented 1 year ago

I think I agree with @nschneid and @ftyers - from what I'm seeing in the examples, I think both "no" and "sa" should be dependents of the predicate, not the copula. As discussed above for English too, the negation "n't" doesn't really modify "cold" morphosyntactically in "it isn't cold". "n't" is an enclitic negator which forms a morphophonlogical 'word' together with "is".

But that is not the position that UD takes: UD is lexico-centric, and takes the positions that the predicate is really "be cold", and that unit, which is being negated, is headed by the lexical item "cold". The same is true for the behavior of auxiliaries and other function words in UD, which ensures uniform behavior across languages with more or less functional items, or even split copula systems (Slavic, Semitic and many other languages). If I'm missing something special that distinguishes "no" from the situation with English clitic negation please let me know! But otherwise I think I'm with @nschneid , it's the same as "n't" and the rest of the complex predicate modifiers.

jnivre commented 1 year ago

This is also how I interpret the evidence in the light of the UD annotation principles.

Best, Joakim

AdeDoyle commented 1 year ago

I think I agree with @nschneid and @ftyers - from what I'm seeing in the examples, I think both "no" and "sa" should be dependents of the predicate, not the copula.

I'm interested to know how you would reconcile that position with one of the examples I gave above, amal nonda "for that I am" = amal (SCONJ) + no (PART) + nda (AUX).

This could be simplified further to nonda "that I am" = no (PART) + nda (AUX).

We could complicate it more by adding back in the emphatic particle, sa, without an explicit predicate, nondasa "that I (as opposed to you or anybody else) am" = no (PART) + nda (AUX) + sa (PRON).

In instances like these no predicate is explicitly stated, and the subject is represented only by the copula. I don't see any way that examples such as these can be reconciled with what you're suggesting.

But that is not the position that UD takes: UD is lexico-centric, and takes the positions that the predicate is really "be cold", and that unit, which is being negated, is headed by the lexical item "cold". ... If I'm missing something special that distinguishes "no" from the situation with English clitic negation please let me know! But otherwise I think I'm with @nschneid , it's the same as "n't" and the rest of the complex predicate modifiers.

I've mentioned above that n't is semantically meaningful, while no is completely semantically empty. As such, n't has the capacity to modify the predicate, while no does not. The semantic meaning of the copula changes in the presence of no, but the particle itself does not have any semantic force like n't that would allow us to argue that it itself is modifying anything.

I think English is a particularly poor touch stone for what is happening in Old Irish, simply because English is cannot be interpreted as both the copula and the subject in one. In a simple English phrase like it isn't, I suspect that both is and n't would be dependent on the subject, the neuter pronoun, it. In Old Irish that can't happen, at least where no subject is explicitly stated. In English you can say "it is dark" or "the night is dark" and the copula links subject and predicate in the same manner. In Old Irish, the copula operates differently in each formation, is dorcha, "it is dark" with the copula, is, representing the subject "it", but is dorcha in adaig "the night is dark" (lit. "it is dark the night") with the subject, "the night" being specifically stated and also represented by the copula. In the latter, we can make comparisons to English or Modern Irish, in the former however, the copula acts more like the Latin verb est "it is". The Old Irish copula would appear to be in a transitionary stage between verb and copula, but it is still very much the copula (and there is a separate substantive verb). It cannot really be treated as a different POS depending whether the subject is explicitly stated or not, and it maintains certain verb-like qualities such as being affected by no, despite no longer being a verb.

nschneid commented 1 year ago

In UD, if there is no predicate separate from the copula, the copula gets "promoted" to predicate. In that case it can have dependents. But when it attaches to the predicate as cop it cannot. Either way it is tagged as AUX.

It may feel strange to make something the dependent of the copula only when there is no main predicate, and the dependent of the main predicate otherwise. But that is how UD works as a compromise across languages. Every language has some constructions that are a bit awkward in UD.

amir-zeldes commented 1 year ago

In instances like these no predicate is explicitly stated, and the subject is represented only by the copula. I don't see any way that examples such as these can be reconciled with what you're suggesting.

Sure, this is similar to English "I didn't!". In such situations, UD promotes the auxiliary to take the place of the missing lexical predicate. See about promotion here

the particle itself does not have any semantic force like n't that would allow us to argue that it itself is modifying anything.

I would say that it's modifying the predication as a whole, much like we attach discourse dependents or other stance adverbials to the sentence root. Note that modifiers of auxiliaries and copulas in general attach to the lexical predicate in UD. For example in the tree for "Ugh, indeed this is still not so simple", everything depends on "simple", including modifiers which ostensibly belong to the sentence as a whole or to the auxiliary in a non-UD analysis:

In a simple English phrase like it isn't, I suspect that both is and n't would be dependent on the subject, the neuter pronoun, it

No, based on the promotion guidelines above, "is" would be the root.

the copula acts more like the Latin verb est "it is"

Sure, but in Latin too, we would consider "est" to be a copula as soon as the predicate is present:

This is how the UD Latin treebanks are annotated as well. See also this recent paper about harmonizing UD Latin annotation, which strongly adheres to the copula-as-auxiliary premise.

ftyers commented 1 year ago

In UD, in deciding what the head should be (see Syntax, we have a couple of principles that might be useful here:

So in the case you describe:

We could complicate it more by adding back in the emphatic particle, sa, without an explicit predicate, nondasa "that I (as opposed to you or anybody else) am" = no (PART) + nda (AUX) + sa (PRON). In instances like these no predicate is explicitly stated, and the subject is represented only by the copula. I don't see any way that examples such as these can be reconciled with what you're suggesting.

You can make the copula the head (promotion by head elision) in the absence of a content word.

AdeDoyle commented 1 year ago

It may feel strange to make something the dependent of the copula only when there is no main predicate, and the dependent of the main predicate otherwise. But that is how UD works as a compromise across languages.

Well, if it is a compromise for the sake of agreement across languages, then it is easier to accept. It's a functional workaround with a purpose, albeit one which does not necessarily reflect the reality of every language. However, I don't think this different treatment of the copula in different situations is really too egregious to me. The real root of my concern is that I perceive that the copula in Old Irish can move between being more of a content word and more of a function word, in a way which I don't believe it does in many other languages. For example, where no explicit subject is presented, and the subject is only reflected in the copula's inflection, it seems to me that it would be equally appropriate to connect the copula to the predicate with the nsubj relationship as with cop.

@amir-zeldes gave an interesting example from English above, "Ugh, indeed this is still not so simple", where "this" is the subject. I'd appreciate if someone could expand on how "this" acts in such a sentence. It is dependent on the predicate, yes, however, is it a function word or a content word? If the sentence were, instead, "Ugh, indeed this, itself, is still not so simple", then I would expect "itself" would be dependent on the subject, "this", not on the predicate "simple". Would that be correct?

If so, then by extension, if "this" and "is" were considered a single word in English, "this_is, itself, not so simple", would "itself" not still be dependent on "this_is". Wouldn't it be inappropriate for "simple" to be its head? This is how I perceive the idea of connecting sa to the predicate when it emphasises the subject in an Old Irish copula formation.

Note that modifiers of auxiliaries and copulas in general attach to the lexical predicate in UD. For example in the tree for "Ugh, indeed this is still not so simple", everything depends on "simple", including modifiers which ostensibly belong to the sentence as a whole...

Yes, but all the modifiers in this sentence have some semantic meaning, except maybe "ugh", which has a special dependency relation, discourse. I'd be very interested to know if any language has a copula that inflects to compliment/take the place of the predicate in the same way the Irish copula can for subject. I suspect it would need to be possible for the copula to be the head most of the time in such a language.

Nevertheless, as it seems that there's some agreement that the predicate is the appropriate head for no, at least in UD terms, I should ask is there any suggested means to distinguish no headed by a predicate from no headed by a verb? For example, nonda "that I am" = no (PART) + nda (AUX) versus nomcharat "they love me" = no (PART) + m (PRON) + charat (VERB). Should only the verb-headed no use the relation compound:prt? If so, what should the other, auxiliary no use?

Also, if I understand the last three responses correctly, the validator can determine whether a copula is promoted to the head as a result of no explicit predicate being present. If that is the case, do we think no should continue to be connected to the copula with the compound:prt relation where the copula is the head, as no relates to the copula in the same way as it does to verbs?

amir-zeldes commented 1 year ago

this [...] is it a function word or a content word?

When 'this' is the entire NP, there is no real distinction between function/content - it is the only exponent of the subject NP (would be nominative in a nom/acc language). If it's used as a determiner, then it's safe to call it a function word ("this <-det- book"), and then the lexical noun is the head.

then I would expect "itself" would be dependent on the subject

Yes, at least in the English corpora, emphatic reflexives are annotated as dependents of the NP they modify, using the label nmod:npmod (examples). If they are not contiguous, it is usually interpreted as a dependent of the verb and labeled obl:npmod.

if "this" and "is" were considered a single word in English

If that were the case, we would need to do one of the following things:

Assume this is a multiword token that needs to be broken up, in the same way the French "au" is subtokenized into "a" + "le". Then each subtoken gets its own deprel etc.
Assume that this is primarily an argument role filler. In this case, there is no copula, and we have a nominal sentence with no verbal component
Assume this is primarily a verbal word, in which case we are looking at pro-drop, and the only expression of the subject is simply the agreement behavior of the verb's morphology. In this case, the word is really a regular pro-drop copula, which is either the root if there is no additional predicate, or a cop dependent otherwise.

From everything written above, I understood that the situation is 3., i.e. the word in question is like Latin "est" with no overt subject, or Italian "è" in "è bellissimo", 'it's beautiful!', which would definitely be root(bellissimo), cop(bellissimo,è).

Should only the verb-headed no use the relation compound:prt?

I'm not sure any of these should be compound:prt. What is the reasoning for calling it a compound? Is it a morphological reason, or is there a special sense/dictionary entry for no+nda? From the examples it looks like it's a clause level particle that is free to combine with any predication (copula or otherwise), so I would have probably gone with discourse (if it's like an interjection) or advmod if there are signs that it's adverbial.

AdeDoyle commented 1 year ago

From everything written above, I understood that the situation is 3., i.e. the word in question is like Latin "est" with no overt subject, or Italian "è" in "è bellissimo", 'it's beautiful!', which would definitely be root(bellissimo), cop(bellissimo,è).

We discussed pro-dropping above. As I mentioned there, I really don't think we can say that what's occurring with the Old Irish copula constitutes pro-dropping. It would be grammatically incorrect for a discrete subject pronoun to be used at all. The copula is not inflecting to agree with a dropped subject pronoun, rather the copula includes the subject, and therefore emphatic pronouns can refer back to it like with English reflexives to subject pronouns. But it seems like UD is incapable of facilitating a copula which functions this way, and a workaround is necessary, similar to if pro-dropping were occurring.

So, in the Italian example you give, "è bellissimo", what would a reflexive emphatic pronoun point to if it were used (I'm not sure if this is a feature of Italian, forgive my ignorance)? Would it point to è or bellissimo? It seems like it should be the former to me.

I'm not sure any of these should be compound:prt. What is the reasoning for calling it a compound? Is it a morphological reason, or is there a special sense/dictionary entry for no+nda? From the examples it looks like it's a clause level particle that is free to combine with any predication (copula or otherwise), so I would have probably gone with discourse (if it's like an interjection) or advmod if there are signs that it's adverbial.

TL;DR - Because no acts like other "preverbs" which are connected to verbs using compound:prt.

no is neither like an interjection nor an adverbial. It is semantically meaningless, so it's impossible to compare it to either. It is considered a "conjunct particle" in Old Irish. Other conjunct particles include the interrogative and negative particles. I use advmod to join negative particles to verbs, for example, but that is because they modify the verb. I reckoned advmod was inappropriate for no as, by definition, it cannot modify anything.

Other "preverbs" exist which were historically discrete prepositions, but these had become part of the verbal complex by the Old Irish period. For example, there is a compound verb, do-beir "he gives", comprised of the preverb do and a verbal root beir, and beir is an expression of a simple verb, beirid "he carries/bears". These preverbs form a very close relationship with the remainder of the verb, and are considered part of the verb for the purpose of alphabetised lexicons and glossaries. As such, they are comparable to English compound verbs like "overthrow" and "undercut" where prepositions "over" and "under" have combined with verbs "throw" and "cut".

The difficulty is that, in English, object pronouns will never bisect the two elements of compound verbs, for example "overthrow him", never "under-him-throw". In Old Irish, this is what always happens if an object pronoun is used with a compound verb, do-m-beir "he gives me". This is comparable to English phrasal verbs like "ask out" where the object can come between the two elements, "ask him out". Unlike English dictionaries, however, which would never consider the two elements of a phrasal verb to be a single lexical item, Old Irish lexical resources consistently treat compound verbs as discrete lexemes. In some cases, similar to the example I gave above, nomchoimmdiu cóima, other parts of speech like nouns and adjectives can even occur between the preverb and remainder of the verb, so preverbs are treated here as separate tokens from the rest of the verb.

So where does no come in to all of this? Object pronouns cannot be used with simple verbs in isolation. This is because the object pronoun must follow a preverb or conjunct particle and precede the rest of the verb. In adherence to this syntactic rule, if an object pronoun is used with a simple verb, like beirid, the empty particle no must used where otherwise a preverb or other conjunct particle might occur, no-m-beir "he carries me". Any preverb or particle which had any semantic meaning would change the meaning of the verb if added ní-m-beir "he does not carry me". So no is semantically a completely meaningless particle used to satisfy grammatical requirements, like enabling the correct syntax by which an object pronoun can occur with a simple verb.

Preverbs are connected to the verb with compound:prt, and so it seemed appropriate to do the same with no, at least, where it fulfils the same purpose as a preverb in the verbal complex. Perhaps another connection is more appropriate for when it used with the copula, especially if it is to be connected to the predicate. Though, I can't imagine what would be appropriate. As I mentioned, either advmod or discourse would imply some semantic modification of the head which no is incapable of as it has no semantic function.

dan-zeman commented 1 year ago

... if any language has a copula that inflects to compliment/take the place of the predicate in the same way the Irish copula can for subject. I really don't think we can say that what's occurring with the Old Irish copula constitutes pro-dropping. It would be grammatically incorrect for a discrete subject pronoun to be used at all. The copula is not inflecting to agree with a dropped subject pronoun, rather the copula includes the subject, and therefore emphatic pronouns can refer back to it like with English reflexives to subject pronouns.

I guess I'm not ready to buy the argument that the copula “contains” the subject. If there is a noun acting as the subject, the form of the copula does not change (citing @AdeDoyle's example: is dorcha in adaig "the night is dark"). Same in Czech: noc je temná lit. "night is dark" vs. simply je temná lit. "(it).is dark". From what I read above, the only difference between Czech and Old Irish is that in Old Irish it would not be grammatical to add a separate subject pronoun; in Czech it is not ungrammatical but very unlikely, used only for emphasis if necessary: ona je temná "she is dark". It still seems to be a dropped pronoun; but in Old Irish the drop is mandatory while in Czech it is optional, although strongly preferred.

If you use an emphatic pronoun like sám "oneself" in Czech, it is more likely that the overt pronoun will be there (because it is emphasis), then obviously sám will be attached to it (já sám jsem za to zodpovědný "I myself am responsible for this"). But if it is not there, sám will be attached to the predicate (much like adjectives in secondary predication): (udělám to sám "(I).will.do it myself").

... the validator can determine whether a copula is promoted to the head as a result of no explicit predicate being present. If that is the case, do we think no should continue to be connected to the copula with the compound:prt relation where the copula is the head, as no relates to the copula in the same way as it does to verbs?

Yes. The validator recognizes the copula by the cop relation. When the copula is promoted to the head position, the cop relation is no longer there and the validator will not complain about children of the copula node. Also, it is natural that the promoted node will inherit other children of the elided node (which would be its siblings if the elipsis did not occur); there is no other option anyway.

tlynn747 commented 1 year ago

Thanks for tagging me @nschneid . I'm not sure if I'd be much help here because as @AdeDoyle points out, modern Irish is very different to Old Irish. The copula in modern Irish gives us a headache in UD - but this seems to be a whole new can of worms ;) I'd imagine Dorus Fransen would also be able to contribute too.

Re the discussion of subject/ predicate in the example of is mise rí na hÉireann "I am king of Ireland". The confusion around this merely comes from the English phrasing of such a sentence. In fact, a more accurate translation is really "The king of Ireland is me" (not you, or him). Mise is the emphatic form that is explicitly telling you the new information. I wrote about this in the context of UD in my thesis (page 64) https://doras.dcu.ie/21014/1/Teresa_PhDThesis_final.pdf

So the copular construction analyses are usually COP PRED SUBJ

I also reference in that section of the thesis that there is an argument that the copula in Irish is really just a linking particle between a subject and predicate. I didn't follow that analysis, but it's worth reflecting on...

But the main discussion here is about no and (1) what it should be attached to and (2) what the relation is.

For (2) I would probably disagree with the compound analysis, if you say that no is semantically empty. We use compound:prt in the Modern Irish treebank for particle verbs (give up, lay out, etc.).

Other preverb particles (like ní) are attached as advmod. If it's not adverbial, what about mark:prt?

I do see your argument that the copula contains the subject am rí "I_am a king", at rí "you_are a king", is rí "he/it_is a king". As an Irish speaker I get that :) Maybe it's similar to the pronominal prepositions in the sense of being marked morphologically.

My observation and suggestion to get around this issue of copulas not having dependents is ( based on limited knowledge of Old Irish and the discussion above): I'd argue that you could attach nda to the root as nsubj where you're focusing on the nominal feature of that word instead of the copular feature. There is no labelled subject in the analysis of the sentence as it stands, which is strange in itself. Syntactic analysis without a labelled copula is fine - but without a subject seems suspicious. The morph features of nda could then capture the copular aspect (so that you don't lose that information). And then attaching no and sa to nda wouldn't be such an issue. UPOS = PRON?

AdeDoyle commented 1 year ago

If there is a noun acting as the subject, the form of the copula does not change (citing @AdeDoyle's example: is dorcha in adaig "the night is dark").

This is true only in persons and numbers other than the 3rd plural. As you'll see in another example I gave, it insi ériu ocus albu "Ireland and Britain are islands" (lit. "they_are islands Ireland and Britain"), the third plural takes the form it "they are".

In any case, I tend to interpret this usage of the copula as a development towards the way it acts in modern Irish anyhow, with Copula-Predicate-Subject structure. It's the copula forms without an overt subject that are more troublesome, as their closest comparisons in modern Irish require a verbal construction. To go back to my example you cited, is dorcha in adaig "the night is dark" (lit. "it is dark the night") is comparable to the modern Irish is dorcha an oiche. However, if we get rid of the overt subject in Old Irish is dorcha "it is dark", the modern Irish equivalent uses the substantive verb tá and a subject pronoun, sé, to form tá sé dorcha.

I guess I'm not ready to buy the argument that the copula “contains” the subject. ... If you use an emphatic pronoun like sám "oneself" in Czech, it is more likely that the overt pronoun will be there (because it is emphasis), then obviously sám will be attached to it (já sám jsem za to zodpovědný "I myself am responsible for this"). But if it is not there, sám will be attached to the predicate (much like adjectives in secondary predication): (udělám to sám "(I).will.do it myself").

So, there is a precedent for attaching a subject pronoun to a predicate in UD, but it does involve pro-dropping. Interesting, though I think these are not 100% comparable scenarios.

Yes. The validator recognizes the copula by the cop relation. When the copula is promoted to the head position, the cop relation is no longer there and the validator will not complain about children of the copula node. Also, it is natural that the promoted node will inherit other children of the elided node (which would be its siblings if the elipsis did not occur); there is no other option anyway.

I understood that the validator would fail it either if it was POS tagged as AUX or related by cop, but if it's based on the relation alone then I must say I really like @tlynn747 's suggestion:

I'd argue that you could attach nda to the root as nsubj where you're focusing on the nominal feature of that word instead of the copular feature. There is no labelled subject in the analysis of the sentence as it stands, which is strange in itself. Syntactic analysis without a labelled copula is fine - but without a subject seems suspicious. The morph features of nda could then capture the copular aspect (so that you don't lose that information). And then attaching no and sa to nda wouldn't be such an issue.

This seems much preferable, and much more representative of the reality of the language. If the other UD experts who've weighed in so far are happy with this analysis also, I'll effect these changes in the treebanks.

Other preverb particles (like ní) are attached as advmod. If it's not adverbial, what about mark:prt?

I also prefer this option to advmod, discourse or even compound:prt. My one concern would be that mark is generally used with conjunctions, and "marks" a separate clause. That would not be appropriate for no, though, perhaps I'm reading too far into the function of the mark relation, and there may already be uses of mark in other treebanks which do not stick to this use case.

dan-zeman commented 1 year ago

If there is a noun acting as the subject, the form of the copula does not change (citing @AdeDoyle's example: is dorcha in adaig "the night is dark").

This is true only in persons and numbers other than the 3rd plural. As you'll see in another example I gave, it insi ériu ocus albu "Ireland and Britain are islands" (lit. "they_are islands Ireland and Britain"), the third plural takes the form it "they are".

So if the subject is 3rd person plural, the copula will have different forms depending on whether noun(s)-subject(s) are present? If I want to say (perhaps as an answer to "what are Ireland and Britain?") "They are islands", will the copula change its form to signal that now it includes the subject because the nouns are not present? Or will it still be it insi?

I understood that the validator would fail it either if it was POS tagged as AUX or related by cop, but if it's based on the relation alone

Yes. Tests like this one must be based on the relation alone because of the possible promotion in case of ellipsis. (There are other tests though, that check the compatibility of some relations with some UPOS tags.)

I really like @tlynn747 's suggestion

Teresa writes about morph features, not about the UPOS tag. Having AUX as nsubj would be quite strange, although the current version of the validator probably won't flag it; but I thought you wanted to re-tag it as PRON. Then it would go very well with the nsubj relation but perhaps the features would be strange (if, for example, you need Tense). But there are no UD-wide restrictions on what features you can use with what UPOS category, so you will be able to explain it in documentation and then register the features you need with PRON.

AdeDoyle commented 1 year ago

So if the subject is 3rd person plural, the copula will have different forms depending on whether noun(s)-subject(s) are present?

In fact, it is the 3rd person plural form of the copula in any case. It's the use of the 3rd singular that's alarming with overt subjects. It's used with 1st and 2nd person sg. and pl. as well as with 3rd sg.

Stifter describes this better than I do, "All independent personal pronouns except for the 3rd pl. can be construed with the 3rd sg. of the copula: is mé 'it is I,' is tú 'it is thou', is é 'it is he', is sí 'it is she', is ed 'it is it', is sní 'it is we', is sib 'it is you.' Only the 3rd pl. always takes the 3rd pl. form of the copula: it é 'it is they' ..." (Stifter, David [2006]. Sengoidelc. Syracuse University Press, p. 171.).

All copula forms other than 3rd sg. and pl., therefore, inflect for subject only when no overt subject is present. 3rd plural is already inflected correctly. Forgive the Simon & Garfunkel vibe of the following examples:

No overt subject

Sg.

am inis "I am an island"
at inis "thou art an island"
is inis "he/she/it is an island"

Pl.

ammi insi "we are islands"
adi insi "you are islands"
it insi "they are islands"

BUT

Overt subject

Sg.

is mé "it is I"
is tú "it is thou"
is é/sí/ed "it is he/she/it"

Pl.

is sní "it is we"
is sib "it is you"
it é "it is they" (lit. "they are they")

If the 3rd plural formation with no overt subject followed the pattern of using is, this could cause confusion with the 3rd singular masc. Both would be is é, as the 3rd sg. masc. and 3rd pl. pronouns are the same, é. This, perhaps, is the reason this formation resisted the use of is in this one position, at least, until a discrete 3rd plural personal pronoun, iad, emerged.

Regardless, it is clear that this formation is interpreted by Stifter (as well as in other learning and grammar material) as the 3rd sg. and pl. forms of the copula being used to express the subject(s), with the following pronouns being the predicates "it is me", "it is you", "it is they" etc. As this is analogous to what happens with the copula in modern Irish treebanks, however, it seemed logical to me to treat this use of the copula as it is in modern Irish treebanks, at least, until the subject of words being dependent on the copula is raised. The same cannot be done where there is no overt subject.

If I want to say (perhaps as an answer to "what are Ireland and Britain?") "They are islands", will the copula change its form to signal that now it includes the subject because the nouns are not present? Or will it still be it insi?

Specifically as a response to an interrogative, I personally suspect an emphatic particle would be likely to be used as well, it insi-som "they are islands", however, in general constructions something like it insi would be perfectly acceptable. Compare, for example, with Wb. 1c7 .i. it huissi ɫ. itcointfi "i.e. they are worthy or they are proper".

Yes. Tests like this one must be based on the relation alone because of the possible promotion in case of ellipsis. (There are other tests though, that check the compatibility of some relations with some UPOS tags.) ... Teresa writes about morph features, not about the UPOS tag. Having AUX as nsubj would be quite strange, although the current version of the validator probably won't flag it; but I thought you wanted to re-tag it as PRON.

I see she has suggested UPOS = PRON, though that would be very hard to square with the grammar, and diachronic development of the language. There is no pronoun there in Old Irish, it seems there never was, and these copula forms never develop into a pronoun in later forms of Irish. If anything, I'd be inclined to change it to VERB, but I think if I can get away with doing so, leaving it as AUX would be the right choice. Particularly as it seems it has always been an inflecting auxiliary verb, even in the prehistory of the language. For example, Stifter (p. 120) reconstructs the Proto-Celtic forms:

Sg.

*emmi "I am"
*esi "thou art"
*esti "he/she/it is"

Pl.

*emmosi "we are"
*etesi "you are"
*(s)inti "they are"

It may have been the case, in the prehistory of the language, that forms such as these would have been used with independent personal subject pronouns, and the inflection marked agreement. We can only conjecture without attestation. By the Old Irish period, however, this was not the case. The copula was the primary expression of the subject, and emphatic particles, etc, could inflect to emphasise this subject without any need or expectation that there should be another subject pronoun.

As a compromise between UD and the grammar as it is understood in the books, I like the interpretation that the copula can be promoted in the same way as if pro-dropping were occurring, using nsubj instead of cop, as @tlynn747 suggested, but maintaining the POS tag which identifies it as as also being an auxiliary verb. This allows the more intuitive linguistic understanding of the Old Irish copula to be realised, while also highlighting that it contains the subject, which is arguably the more important component.

jnivre commented 1 year ago

To me this looks like two different constructions, where the second series does not have overt subjects either. Instead, the independent personal pronouns function as (nominal) predicates. Compare:

she is an island cop(island, is) nsubj(island, she)

it is she/her cop(she/her, is) expl(she/her, it)

The only thing that is unexpected is then the “agreement” in the third person plural. Does that make sense?

Joakim

AdeDoyle commented 1 year ago

To me this looks like two different constructions, where the second series does not have overt subjects either.

I deliberately left out the overt subject for the sake of avoiding confusing literal translations into English. However, see instead:

Overt subject

Sg.

is mé rí "I am king" (lit. "it is I king")
is tú rí "thou art king" (lit. "it is thou king")
is é/sí/ed rí "he/she/it is king" (lit. "it is he/she/it king")

Pl.

is sní ríg "we are kings" (lit. "it is we kings")
is sib ríg "you are kings" (lit. "it is you kings")
it é ríg "they are kings" (lit. "they are they kings")

But, it is good that you note how the pronouns could function in this second construction as nominal predicates. This is the link between the two systems. If left without overt subjects, as I had left them above, these second construction forms could be considered simply as forms of the copula with inbuilt 3rd person subjects, and with predicates of different person and number indicated by the pronouns.

The only thing that is unexpected is then the “agreement” in the third person plural. Does that make sense? Joakim

I tentatively explained this above as probably resulting from reluctance to tolerate linguistic ambiguity. If the 3rd singular form of the copula were used across the paradigm both the 3rd singular masc. and 3rd plural would be identical, is é. Nevertheless, the best way to understand it, I think, is probably as I cited Stifter translating it above, "it is they".

amir-zeldes commented 1 year ago

All copula forms other than 3rd sg. and pl., therefore, inflect for subject only when no overt subject is present.

There is a somewhat similar phenomenon in (Standard/Classical) Arabic, where plural verb inflection in VS(O) sentences only occurs if there is no overt subject, but when the subject is mentioned, the verb appears in what looks like the singular form (this has been explained by some historical linguists as reflecting an earlier language stage where the plural inflection might have been a pronoun itself). In any case, I don't think that needs to play into the deprel: if it's a copula (and not a pronoun), then it should be cop when the predicate is mentioned (or promoted otherwise), and it can't have dependents - this is also how UD Arabic behaves.

I'd argue that you could attach nda to the root as nsubj where you're focusing on the nominal feature of that word instead of the copular feature.

Yes, I also think, that would be totally fine (option 2. in my previous comment), but it sounds like that is not what @AdeDoyle would like to prioritize

no is semantically a completely meaningless particle used to satisfy grammatical requirements, like enabling the correct syntax by which an object pronoun can occur with a simple verb

I think I understand why this argues for compound:prt, if only to keep it parallel to other preverb + verb constructions. Let me try a fake English translation of this to see if I have this right: I think it's as if English had a copula with a meaningless phrasal particle "out", and since we use compound:prt for "watch out", we would also want the same analysis for the hypothetical complex copula "be out". This hypothetical version of English would have sentences like "Kim is a student out", which means exactly the same as "Kim is a student" (and the conditions on when 'out' is required could have to do with word order, presence of overt arguments, etc.). Is that right?

If so, I do understand why compound:prt is attractive, but that is simply not an option based on the UD guidelines as soon as we say that this is a copula. The only ways to keep its incoming deprel as cop but keep the particle as a dependency chain with the copula are either fixed or flat, and both of those would have to go left to right (keeping in mind that this does not indicate that the particle is the head; both of these labels are officially headless and just left-to-right by convention).

AdeDoyle commented 1 year ago

if it's a copula (and not a pronoun), then it should be cop when the predicate is mentioned (or promoted otherwise), and it can't have dependents - this is also how UD Arabic behaves.

I'd argue that you could attach nda to the root as nsubj where you're focusing on the nominal feature of that word instead of the copular feature.

Yes, I also think, that would be totally fine (option 2. in my previous comment), but it sounds like that is not what @AdeDoyle would like to prioritize

I'm getting the impression that "promotion" of an AUX to nsubj is not something people are happy with. That seemed like the best solution to me. I think nsubj is preferable as a deprel, but I can't say I would prioritise this above identifying the copula as a copula, or vice versa. It's not that I'm particularly tied to the AUX part-of-speech, as the Old Irish copula doesn't seem to be functioning as UD typically understands auxiliary verbs. I think it would be very incorrect to consider it a pronoun, or anything very distinct from a copula, though.

I think I understand why this argues for compound:prt, if only to keep it parallel to other preverb + verb constructions. Let me try a fake English translation of this to see if I have this right: I think it's as if English had a copula with a meaningless phrasal particle "out", and since we use compound:prt for "watch out", we would also want the same analysis for the hypothetical complex copula "be out". This hypothetical version of English would have sentences like "Kim is a student out", which means exactly the same as "Kim is a student" (and the conditions on when 'out' is required could have to do with word order, presence of overt arguments, etc.). Is that right?

Yes, that's the general logic behind it, however, "out" still has semantic meaning in "watch out". It changes the meaning of "watch" when it is added to it. In this sense "out" is acting more like a regular preverb in Old Irish than it is like the semantically empty no.

I'm going to switch to using "kick out" instead of "watch out" for examples because you can say things like "kick him out" or "Kim kicks a student out", which is like what happens in Old Irish compound verbs. The distinction between no and other preverbs comparable to "out" is that, in Old Irish you can say something like "Kim kicks him out", because it has the particle "out". The construction "Kim kicks him" is impossible, however, because some sort of particle is needed to allow the object pronoun to be used with the simple verb "kick". You would need to use a completely meaningless particle in place of "out" to allow you to use the pronoun "him" as the object of the verb, "kick", without changing its meaning. Let's just define such an empty particle for English, say "ip", which is functionally equivalent to Old Irish no. So, "Kim kicks him'ip".

Use of no for grammatical purposes can get a bit more complex than just infixing pronouns within the verbal complex, although its uses are more limited with the copula (you don't get pronouns infixed between the particle and the copula as you can with verbs). You do get another grammatical use of no, however, whereby the particle is used to support "nasalisation", the insertion of an n. Nasalisation can be used to mark a relative clause, but it can't exist alongside a verb or copula on its own, it needs a particle. This is the reason behind the second n in nonda "that I am". It is a a nasal which marks a relative clause, itself supported by no.

To compare this construction to the English verb "kick" and empty particle "ip" coined above, let's assume there's a relative marker n which cannot be added to a verb/copula without a particle to support it. "It is Kim kicks'n'out the student" would mean something like "it is Kim who kicks the student out". This, again is fine, because the particle "out" occurs naturally in the sentence. This n cannot be attached to the verb "kick" without the help of some sort of particle, however, so expressing "it is Kim who kicks the student" like "it is Kim kicks'n the student" is impossible. The empty particle must be used to support the nasal, "it is Kim kicks'n'ip the student".

This relative construction can also be applied to the copula. To expand the English example, "is" can occur in isolation with the regular meaning. It cannot have particles like "out" added to it like verbs can, but the empty particle "op" must be in some situations. You can say "Kim is" normally, but if you want to express "that Kim is" you need nasalisation "Kim is'n". Nasalisation cannot occur, however, without the empty particle, hence, "Kim is'n'ip". So, if you want to construct the sentence, "are you telling me that Kim is nice? It was Kim who kicked the student", you'd have to write, "are you telling me Kim is'n'ip nice? Kim kicked'n'ip the student?"

It's a bit of a head-wreck.

If so, I do understand why compound:prt is attractive, but that is simply not an option based on the UD guidelines as soon as we say that this is a copula. The only ways to keep its incoming deprel as cop but keep the particle as a dependency chain with the copula are either fixed or flat, and both of those would have to go left to right (keeping in mind that this does not indicate that the particle is the head; both of these labels are officially headless and just left-to-right by convention).

Understood. And this may be the necessary workaround, but I think it would be preferable to avoid this outright by using nsubj instead of cop.

Stormur commented 1 year ago

Just chiming in (not wanting to restart the discussion all over :eyes: ) to point out that, as recently discussed per email (is it possible to link those exchanges?), this could be one of the cases very easily solved by means of multiword tokens (MWT), instead of a "SpaceAfter=No splitting style" as presented in this thread by the original poster. Then no would simply appear in the surface token without need to receive a morphosyntactic analysis (with compound having no motivation whatsoever, in my opinion).

For the rest I think there is not much to be discussed about this being a regular copular construction in a language where verbal elements included copulae (can) inflect also for person, as happens in many other languages, but with particular rules for agreement (if I am not mistaken, something similar, i.e. a verb reverting to a "basic person" if the subject is made explicit, happens in Breton or Cornish, too).

AdeDoyle commented 1 year ago

I'm inclined to agree with that the morphosyntactic analysis, compound, isn't the best option for no in Old Irish. Though, I don't think there really is any relationship which is entirely appropriate for Old Irish verbal particles anyhow, much less the particular use case of no.

@dan-zeman did suggest the possibility of treating no as something other than a standalone syntactic word (March 19th, above), and we discussed this possibility in the following comments. @amir-zeldes then suggested (March 29th) the possibility that we could "Assume this is a multiword token that needs to be broken up..." but specified that in such a case "each subtoken gets its own deprel etc." This seems to conflict with your suggestion that as a MWT "no would simply appear in the surface token without need to receive a morphosyntactic analysis", unless I am misunderstanding something.

To my mind, regardless of the most appropriate deprel, the fact that various parts-of-speech (including pronouns, nouns and adjectives) can occur between no and a following verb makes it difficult to justify treating combinations of it with a following verb as either single syntactic tokens or as a MWTs.

Stormur commented 1 year ago

Yes, my point was that probably the best solution is to not assign this element any analysis inside the MWT it appears in, and this is indeed a different proposals than the other ones.

I am suggesting it because it has been remarked that this element has no sensible morphosyntactic nor semantic analysis, and it appears just as a phonetic support for the nasalisation of the following element. So it exists only in that specific context and nowhere else. Let me try to compare it with a couple of possibly similar phenomena in Italian:

the "compound preposition" della 'of the (f.)'
- analysed as ADP di + DET la : de exists as a (regional/old) variant of di 'of', so hypothetically we could also split this as de + la, but in any case we cannot assign anything to the "extra" l: it just exists at the token level, as a "surface representation", a purely phonological phenomenon
old-fashioned/dialectal in istrada 'in the street' (instead of common in strada)
- not a MWT here, but the point is that istrada will be just lemmatised as a variant of of strada 'street', without no analysis whatsoever of the epenthetic i

Still I recognise that a problem might arise when this no appears orthografically separated. But given what was said above, I would settle on part of speech X and relation dep from the head of the whole phrase, exactly since

there really is [not] any relationship which is entirely appropriate for Old Irish verbal particles anyhow

Even if I am not sure this could be called a "verbal particle", at least not in the sense that it is participating to the predication somehow as a negation or an aspect marker, but rather that it just "goes with a verb phrase".

Nothing better comes to my mind, since fixed and goeswith would make it the conventional head, and this would be bad, I think. But if it were possible to mark this as an accepted case of word with spaces, as some are, like numbers and abbreviations, it could be even better (though this would possibly lead to MWT with internal spaces, a strange beast, probably...). The fact that

various parts-of-speech (including pronouns, nouns and adjectives) can occur between no and a following verb makes

just tells me that its annotation, if any at all, should be as empty as possible, since as said it does not pertain to morphosyntax nor semantics, and actually strengthens the case for MWT. In this sense, one of the reasons for which compound is problematic here (apart being problematic itself...) is that it entails a meaning/function for the participating members, but this is not the case.