UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
271 stars 245 forks source link

The parent of 'orphan' should normally be 'conj' but it is 'reparandum' #635

Closed amir-zeldes closed 5 months ago

amir-zeldes commented 5 years ago

Hi - a recent update to the validator creates the error message in the title, however in the Coptic corpus we have an exception that looks correct to me: a reparandum consisting of two dependents whose head is missing. I'll give the example in English for simplicity:

... they hav-- every community according to our...
reparandum(community,they)
orphan(they,have)

The alternative of saying that both 'they' and 'have' are reparandum is unappealing, because there is a whole interrupted phrased ("they have") which results in a single repair. The option of treating 'they' as the subject of 'have' is not available in Coptic, since the equivalent of have is a past auxiliary which never takes the subject directly (it is aux). Basically there is a missing verb that would have dominated "they have", so in its absence we've promoted the subject, and treated the auxiliary as an orphan.

Any suggestion is appreciated, but if there isn't a good reason to reject orphan I would suggest allowing reparandum as a parent of orphan.

dan-zeman commented 5 years ago

I admit that when adding the validation rule, I assumed that it would spark discussion and possibly it would have to be loosened. Though the example you give is quite beyond my imagination :-) I have moved the issue to the docs repository because it is about precise interpretation of the guidelines (as most validation issues). The validator just tries to make sure that people can make assumptions about the data, if the assumptions follow from the documentation. The immediate impulse to introduce this rule was when I saw an orphan attached to another orphan (people probably forgot that the promoted orphan should not be attached via this relation).

In my understanding, the orphan relation was introduced solely for gapping and stripping. That is, the missing node is a predicate and the orphaned dependents are its arguments or adjuncts. It typically occurs in coordination, so the promoted argument is attached to the first predicate as conj; but the validator would also accept parataxis (which some people use instead of coordination), root (because the source can be in the previous sentence) and even advcl (because subordination can generate similar situations). It seems quite plausible that a similar pattern could occur also with reparandum, so I think I should add it. But I am not convinced that your particular example should involve the orphan relation. The guidelines assume that if the main verb is elided and an auxiliary remains, then the auxiliary is promoted and the other dependents are attached to it as if it was the main verb; which would result in the relation nsubj(have, they), which you reject.

dan-zeman commented 5 years ago

I whitelisted reparandum in https://github.com/UniversalDependencies/tools/commit/e8178934276e8e452ebf6f2db71d09a48f316d57. Errors of this type are currently not reported in Coptic. I will leave this issue open for a while so that others can contribute to a better understanding of the orphan relation.

amir-zeldes commented 5 years ago

Thanks! The idea of promoting the auxiliary makes a lot of sense in a language like English, where the auxiliary is itself a verb. In the Coptic example, the element in question is really just a functional auxiliary, with no chance of being used as a verb, so it seems a little stranger to promote it, rather than a core argument. In terms of parallels elsewhere in the corpus, orphan is often a dependent of the subject, which get first choice as the argument to promote - for that reason I would be inclined to keep the subject as the head and say that it governs all other dependents of the missing verb - in this case also the tense marker.

jnivre commented 5 years ago

The point is not that the auxiliary can be used as a main verb but that it belongs to the same nucleus (in Tesnière's sense) as the main verb. As long as there is something left of the verb group, we prefer to let this represent the verb group so that other dependents can retain their true dependency relations (rather than "orphan"). It is for the same reason that we, for example, promote determiners to head elliptic noun phrases even though they can never head an ordinary noun phrase.

amir-zeldes commented 5 years ago

Both subject and auxiliary are dependents of the (missing) main verb - in my opinion the question is only which would we rather promote. Either way some information will be lost:

  1. If we promote the auxiliary, we gain the information that the subject was probably a subject, but we lose the information that ellipsis has taken place (as marked by orphan)
  2. If we promote the subject, we lose its function (since it must now bear the reparandum relation), but we retain the information about ellipsis, since its dependent will be orphan, the same relation used in other cases where a subject stands in for a missing verb

One problem with 1. for Coptic is that auxiliaries aren't present in all tenses, so we would end up with situations in which we have ellipsis and a. we do have orphan, but aux is the head; b. we do have orphan, and nsubj is the head (since there is no aux) and c. where there is no orphan at all. Promoting the subject uniformly seems like a better choice for the data we have.

I agree that in languages where the auxiliary is a finite verb (like English) it is more intuitive to promote the auxiliary, but in this case it seems like more information would be discarded, and a very odd government pattern would result (AUX ->nsubj PRON, which is impossible in Coptic, and with no trace of an orphan)

dan-zeman commented 5 years ago

@amir-zeldes : Just a note – orphan is not a means to mark ellipsis. Most instances of ellipsis are simply lost in UD (that is, they are hidden due to promotion). The purpose of orphan is to avoid certain relations that only occur in some instances of ellipsis and that would be "too strange". So if nsubj(AUX, PRON) is "too strange" in Coptic, then this is possibly the argument in favor of orphan. (But then we would have to document it. The aux relation is not listed in our obliqueness hierarchy, for example.)

sylvainkahane commented 5 years ago

To come back to the main topic of this issue, non-constituent conjunct are quite common with reformulations. Examples:

it is a good a very good question you said something about the about my question I think that you that we must go

If we use reparandum for such cases, the reparandum phrase (in italic in the examples) certainly needs an orphan relation.

dan-zeman commented 5 years ago

@sylvainkahane I think that the three examples you gave would be solved without orphan according to the UD guidelines. Simple promotion of one of the orphaned dependents.

det(good-4, a-3); det(question, a-5); advmod(good-7, very); amod(question, good-7); reparandum(good-7, good-4)

case(the, about-4); case(question, about-6); det(question, my); reparandum(my, the)

mark(you-4, that-3); mark(go, that-5); nsubj(go, we); reparandum(we, you)

lauma commented 5 years ago

https://github.com/UniversalDependencies/docs/issues/635#issuecomment-497458687 :

In my understanding, the orphan relation was introduced solely for gapping and stripping. That is, the missing node is a predicate and the orphaned dependents are its arguments or adjuncts. It typically occurs in coordination, so the promoted argument is attached to the first predicate as conj; but the validator would also accept parataxis (which some people use instead of coordination), root (because the source can be in the previous sentence) and even advcl (because subordination can generate similar situations). It seems quite plausible that a similar pattern could occur also with reparandum, so I think I should add it. But I am not convinced that your particular example should involve the orphan relation.

If advcl is okay, is acl any different? If I understand correctly, it can be a subordinate clause with its own potentially gapped predicate the same as advcl?

E.g., in Latvian saying viņš ēd tos ābolus, ko pirms tam [ēda] tārpi ('he eats the same apples, which where [eaten] by worms before that') is rather plausible.

dan-zeman commented 5 years ago

Sounds good to me. Added acl.

lauma commented 5 years ago

And what about other subordinate clauses - csubj and ccomp? Latvian sometimes just omits the verb in the subordintat clause, even if it is not explicitly repeated, but just easy to deduce from all other parts in that clause. We got sentence atjēdzos, ka bez angļu valodas nekur [netikšu] '[I] realised, that [I will get] nowere without English' in our data. For us it felt most natural to use analysis with ellipsis here, but is this appropriate for UD?

dan-zeman commented 5 years ago

Well, perhaps all deprels that can mark incoming edges to heads of clauses make the heads technically eligible for outgoing orphan edges? Although this example seems even further from prototypical gapping. What do others think about this (@jnivre @manning @sebschu)?

Note that I don't doubt that this actually is ellipsis; but most types of ellipsis are annotated in UD without using the orphan relation. So I think the question is not whether it is ellipsis but rather if it is (sufficiently similar to) gapping.

sebschu commented 5 years ago

Yes, this is an interesting data point that we haven't considered so far. I always consider orphan to be appropriate when there is an elided predicate with multiple dependents and in our UDW-17 paper, we argued (like Gerdes and Kahane, 2015) that this also includes sentences with elided predicates where the predicate only appears in a preceding sentence (rather than in another clause in the same sentence).

It seems like this case is a little different since the predicate does not necessarily appear anywhere in the preceding discourse (if I understood correctly) but it still fulfills the criterion of a missing predicate with multiple dependents. So in short, yes, I think using ccomp to attach English and orphan to attach the complementizer and nowhere would be the right call for this sentence.

KoichiYasuoka commented 5 years ago

In Classical Chinese, very few orphan occurs and it was originally cc before stripping. For typical example "學而習" (study and practice) went "學而" in a chapter title. In this case, conj at 學―conj→習 gone away, and 而←cc―習 gone orphan. How do we do this for the validation?

dan-zeman commented 5 years ago

This does not look like a case for orphan to me. Even in clauses where orphan is used, it does not replace a cc relation. In the gapping examples in the guidelines, the promoted heads in the gapped clauses still have cc children.

One possibility would be to simply attach the conjunction to the remaining verb, i.e., to the left: 學―cc→而. But that would mean we do not see any ellipsis there.

If you know there is a verb missing, the standard way is to pick one of the nodes that would depend on it, and promote it as the substitute head of the clause. The clause is still connected to its parent node with the relation that holds between the two clauses, i.e., conj. But as the clause is now represented by a substitute head node, the relation leads to this new head. In our case, only one node is left from the clause, and it is the conjunction. Therefore the conjunction will be promoted and we will have 學―conj→而.

KoichiYasuoka commented 5 years ago

Thank you for your comment, @dan-zeman , and I've tried acl to link to the "parents" of orphans. I understand that this is not good choice, but only two orphans in our problem might be resolved this time. orphan

gossebouma commented 5 years ago

The constraint on the parent of orphan leaves me with a bit of a problem for cases like this:

opgesplitst in een Vlaamse en een Franstalige partij split in a Flemish and a French-speaking party

obl(opgesplitst,Vlaamse) orphan(Vlaamse,een-1) cc(en,partij) conj(Vlaamse,partij)

Should we allow configurations like this? Only alternative is see is accepting det(Vlaamse,een) (Vlaamse is an adjective)

Any suggestions?

gossebouma commented 5 years ago

I only noticed now that @jnivre wrote:

we promote determiners to head elliptic noun phrases even though they can never head an ordinary noun phrase.

So is that the solution here?

dan-zeman commented 5 years ago

So is that the solution here?

Yes.

opgesplitst in een Vlaamse en een Franstalige partij

obl(opgesplitst, Vlaamse) case(Vlaamse, in) det(Vlaamse, een) conj(Vlaamse, partij) cc(partij, en) det(partij, een) amod(partij, Franstalige)

sylvainkahane commented 5 years ago

Nobody use orphan in comparison? I think of sentence such as "Today I received the same message as you yesterday".

dan-zeman commented 4 years ago

@sylvainkahane : I believe orphan is used also in comparison in constructions like the one you mentioned. I think it has been discussed somewhere but I do not see the example directly in the guidelines.

hanneme commented 1 year ago

A follow-up to this one: Working on the PROIEL/TOROT conversion (with @daghaug) we have cleaned up our processing of elliptic structures considerably, but we are still getting orphan error messages for various types of clausal heads with orphans.

Right now we see that elliptic headless relative clauses that are themselves subjects (not modifying any nominal) trigger the orphan error message "The parent of 'orphan' should normally be 'conj' but it is 'nsubj'", such as in this Latin example:

qui multum non abundavit et qui modicum non minoravit "(The one) who (gathered) much did not have too much, and (the one) who (gathered) little did not have too little" where we now get nsubj(abundavit,qui) and orphan(qui, multum) and ditto in the second part of the sentence.

Currently, then, such argument relative clauses are nsubj and obj in our conversion, but the error message made us wonder if they should actually be csubj and ... ccomp? As far as I can see the guidelines just assume that relative clauses always modify a nominal. What do you say, @dan-zeman?

hanneme commented 1 year ago

Reposting this as I assume no one saw it, since I managed to post and then reopen the thread:

A follow-up to this one: Working on the PROIEL/TOROT conversion (with @daghaug) we have cleaned up our processing of elliptic structures considerably, but we are still getting orphan error messages for various types of clausal heads with orphans.

Right now we see that elliptic headless relative clauses that are themselves subjects (not modifying any nominal) trigger the orphan error message "The parent of 'orphan' should normally be 'conj' but it is 'nsubj'", such as in this Latin example:

qui multum non abundavit et qui modicum non minoravit "(The one) who (gathered) much did not have too much, and (the one) who (gathered) little did not have too little" where we now get nsubj(abundavit,qui) and orphan(qui, multum) and ditto in the second part of the sentence.

Currently, then, such argument relative clauses are nsubj and obj in our conversion, but the error message made us wonder if they should actually be csubj and ... ccomp? As far as I can see the guidelines just assume that relative clauses always modify a nominal. What do you say, @dan-zeman?

amir-zeldes commented 1 year ago

qui multum non abundavit et qui modicum non minoravit

I agree based on the Latin example that parent of orphan should also be allowed to be root (and therefore possibly also parataxis, although that is not necessary for this example)

the error message made us wonder if they should actually be csubj

I think it should be csubj if the following version would also have been csubj in the Latin guidelines:

qui collegit multum non abundavit "the one who collected much did not have too much" root(abundavit) csubj(abundavit, collegit) nsubj(collegit, qui)

In that case, "qui" is just being promoted to cover for the missing relative clause subject (at least I assume that's how it would be annotated, but if that is not the case in the Latin guidelines and "qui" is seen as a matrix argument, then the elliptical version is also not a clause).

hanneme commented 1 year ago

A version with a non-elliptic relative clause would be analysed as follows in our current version of the conversion script:

root(abundavit) nsubj(abundavit, collegit) nsubj(collegit, qui) obj(collegit, multum)

And surely the elliptic clause must go the same way. I'm not sure what the other Latin treebanks do (do you know, @daghaug?), but wanted to check if there is a general UD policy for headless relative clauses.

In any case, I think any clausal head type must allow orphan dependents, since ellipsis is in principle always possible, so if nsubj is allowed for headless relative clauses, nsubj must allow orphan dependents, if it must be csubj then csubj must allow orphan dependents etc.

nschneid commented 1 year ago

To make sure I understand, an attempt at an English analogy:

Is this right? It does seem like a valid use case, since we wouldn't normally promote a subject or object as head of a clause when the predicate is missing. Though it is a pity we can't see that there's a free relative construction in the orphan analysis.

amir-zeldes commented 1 year ago

nsubj(abundavit, collegit)

This seems a bit strange to me, since collegit is a verb, so I would have expected csubj

Is this right?

The English translation seems basically equivalent, except that in Latin we have a plain relative pronoun "qui", which is basically like "who" rather than "whoever". So this is more like Shakespeare's "who steals my purse steals trash", with "steals" elided in the first conjunct of a coordination.

hanneme commented 1 year ago

This seems a bit strange to me, since collegit is a verb, so I would have expected csubj

Yes, I can see why. But if we do that, the next question is what to do with object relative clauses, which also occur aplenty. Should they be ccomp? Our converter now has them as obj. (They can of course have ellipsis too.)

reddite ergo quae Caesaris sunt Caesari give thus [which.nom.pl Caesar's are] to-Caesar

daghaug commented 1 year ago

So in the PROIEL annotation that free relative clauses are syntactically nominal because they distribute exactly like NPs and not like clauses.

In subject position, the csubj/nsubj distinction is maybe not so important, but free relative clauses occur in other nominal positions as well.

When they occurr in object position, we would presumably have to label them ccomp if we take them as clausal. Perhaps not a disaster, but it would definitely give the impression that some verbs can take complement clauses when in fact they only take NPs (and free relative clauses).

Probably the most disturbing case would be the one where the free relative clause is the complement of a preposition as in

videbunt in quem transfixerunt they will look at the one they pierced' (literallythey will look at whom they pierced)

This is

obl(videbunt, transfixerunt) obj(transfixerunt, quem) case(transfixerunt, in)

If we treat free relative clauses as clausal, I guess it would have to be advcl? And the preposition would have to be considered mark?

on., 12.04.2023 kl. 08.24 -0700, skrev Amir Zeldes:

nsubj(abundavit, collegit) This seems a bit strange to me, since collegit is a verb, so I would have expected csubj Is this right? The English translation seems basically equivalent, except that in Latin we have a plain relative pronoun "qui", which is basically like "who" rather than "whoever". So this is more like Shakespeare's "who steals my purse steals trash", with "steals" elided in the first conjunct of a coordination. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

nschneid commented 1 year ago

Oh, I see the problem. The free relatives are treated as clauses lacking a nominal head, which is different from how we treat them in English: https://universaldependencies.org/en/dep/acl-relcl.html#free-relatives

Is it an option to treat the WH-word serving as subject as the head of the clause, and indicate the subject relation in the Enhanced Dependencies? So

nsubj(abundavit, qui) acl:relcl(qui, collegit) E:nsubj(collegit, qui) - enhanced dependency

amir-zeldes commented 1 year ago

I think either analysis is possible, and I understand the pros and cons. If this is the normal and only way to do free relatives in Latin, then my gut feeling is that what Nathan is suggesting makes the most sense. We had some similar thoughts in Coptic, but that language is more like English in that most free relatives have an explicit nominalization (something like "the one who"), and the examples with a plain relativizer (something like "who", except it's an indeclinable relativizer) are more rare, so we made those take clausal deprels. But canonically, yes, I would expect free relatives to take nominal deprels, among other things for the reasons Dag outlined above.

daghaug commented 1 year ago

That's right, they are treated differently. The reason is that the case of the relative pronoun is governed by its function inside the releative clause. So if it's a downstairs object it would be accusative, as in "quem vidi, venit" (literally 'whom I saw arrived'), and it would be strange to take this accusative pronoun as the subject of "venit" (arrive) rather than the object of "vidi" (saw).

That said, we will preserve the original annotation in our source data, and we could give it up in the UD conversion for the sake consistency if there were clear rules for how to deal with free relatives, but the web page does not exactly suggest that. Basically we are following the annotation suggested for Czech in (in the case where the demonstrative is elided).

on., 12.04.2023 kl. 12.04 -0700, skrev Nathan Schneider:

Oh, I see the problem. The free relatives are treated as clauses lacking a nominal head, which is different from how we treat them in English: https://universaldependencies.org/en/dep/acl-relcl.html#free-relatives Is it an option to treat the WH-word serving as subject as the head of the clause, and indicate the subject relation in the Enhanced Dependencies? So nsubj(abundavit, qui) acl:relcl(qui, collegit) E:nsubj(collegit, qui) - enhanced dependency — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

nschneid commented 1 year ago

Ah, yeah there's less of a case argument to be made for English since the who/whom distinction is disappearing, though technically "whoever saw me" vs. "whomever I saw" has the same issue I guess—case is assigned by the relative clause.

hanneme commented 1 year ago

I looked at the validation script now, and the permitted head relations for orphans are currently conj, parataxis, root, csubj, ccomp, advcl, acl and reparandum. If we are to continue treating relative clauses like nominals, which I would prefer for the reasons @daghaug lists, a much wider range of relations would have to be permitted (or at least be exceptions for this type of language). Apart from this we also get real examples of ellipsis at least with xcomp and dislocated.

xcomp: We have occasional examples of the type "He wanted (to go) to Jerusalem on foot" where it's clearly not the modal verb that takes the PP argument and adjunct. Old East Slavonic example: xočem na smerdy i pogubiti ě ‘we want (to go) after the peasants and kill them' (where an elliptic xcomp is coordinated with a non-elliptic one) dislocated: We currently use dislocated for for preposed correlative clauses, of the type "What he said, that we understood" and "Where you go, there we will follow", and these can of course be elliptic too. (We could do acl/advcl instead for these.)

daghaug commented 1 year ago

Can I bring this to the attention of @dan-zeman because we need to know how to deal with this in the conversion?

The issue is that the validation rules for orphan enforces particular analyses on other constructions. So for the free relative clauses, we can make them nominal, but only if we take the wh-word as the head, or we have leave the wh-word where it belongs for case reasons, but only if we make them clausal. If there was a UD standard, we'd be happy to go either way, but as long as there isn't, we would really prefer to keep our analysis as is. (I could also give arguments for it, but that's really for somewhere else - I think these are nominalizing constructions, in much the same way as morphology can be nominalizing.)

So if the validation rules are not going to change, I think the best solution for us might be to take these sentences out of the converted data set until the status of free relative clauses (and the modal constructions and the correlatives, as mentioned by Hanne) is clarified. But it would be good to know soon what we should do...

dan-zeman commented 1 year ago

There is currently no UD-wide consensus on free relatives as far as I know, and perhaps they should stay language-specific. As you have noticed, the perspective we take in Czech is different from what people do with the English data.

qui multum non abundavit et qui modicum non minoravit "(The one) who (gathered) much did not have too much, and (the one) who (gathered) little did not have too little" where we now get nsubj(abundavit,qui) and orphan(qui, multum) and ditto in the second part of the sentence.

If we assume that there are two nodes elided in each clause, 1. "the.one", and 2. "gathered", and if we also assume that this does not qualify as (similar enough to) gapping, then qui will be first promoted to the head of the relative clause (thus acquiring the acl:relcl relation) then further promoted to the place of "the.one" (thus acquiring nsubj). Multum will be attached to qui as obj. Because of the missing verb in the relative clause, you get qui in the main clause even without treating free relatives as in English (while if "gathered" was present, it would be the promoted node and you would have a verb attached as nsubj in the main clause). However, if you do treat free relatives as in English, then you already have qui in the main clause without promotion. The verb "gathered" would be attached to it as acl:relcl. The verb is not present though, and there is only one orphaned dependent, which will be promoted and inherit the relation, i.e., you get acl:relcl(qui, multum). No orphan relation will be used.

Now getting back to the first option where we did obj(qui, multum), assuming that this is not gapping. "Gathered" is a verb and qui and multum are its subject and object, respectively, which makes it similar to the situation that led us to define the orphan relation for gapping. Yet it is different from gapping because there is no indication in this or the neighboring sentences that the missing verb is "gathered" – that seems purely hypothetical, based on semantics or pragmatics (while gapping is closer to syntax: you simply do not repeat the verb that is overtly present in another conjunct). If you ever add enhanced representation to the corpus, you should be ready to insert an empty node representing "gathered" and make the two nominals its arguments.

I don't know which of the options outlined above is the best one. But the double ellipsis and double promotion in this example suggests that almost anything can be the head of an orphan relation, and the validation test may have to be abandoned. (I introduced it because people misunderstood orphan as a general remedy that they should use every time they sense ellipsis around — while in fact it should be used only in a very restricted subset of ellipsis.) Or perhaps the test should be reclassified from an error to just a warning?

hanneme commented 1 year ago

Thank you, Dan! In the original PROIEL annotation this sentence does of course have empty nodes with argument dependents, that is the point of departure for our conversion.

I think it might be nice to reclassify the test as a warning, we certainly found a lot of issues with our ellipsis handling because of those error messages.

Stormur commented 1 year ago

I am sorry to come late here, but I missed that this topic also touches upon some issues in Latin annotation that we addressed in the past months (regarding IT-TB, LLCT and UDante treebanks).

@hanneme @daghaug , I invite you to take a look at the documentation pages that I wrote for free relative clauses in Latin. I think that they were not already there in April, but they appeared soon after (we had some internal discussion in our group).

Basically, following general UD criteria, we are using clausal relation (csubj, ccomp/xcomp, advcl) for "free relatives". The "double pronoun" is always an internal argument of these clauses (as commented in the guidelines, we were acting differently, but doing otherwise created weird, hardly justifiable structures - and I am actually convinced this is valid in general, not only language-specifically). Now, all these relations also take the :relcl subrelation to distinguish/retrieve them (and thus please note that advcl:relcl means a rather different thing in Latin than in English).

So, taking your sample sentence

qui multum non abundavit et qui modicum non minoravit

the annotation will be as follows:

csubj:relcl(abundavit,qui) orphan(qui,multum)

qui is internally promoted as the head in that it is the subject.

I think the validator does not complain here, would it? Or does it just issue a warning?

I understand the point that these clauses are acting nominally and very much agree that they should be able to take nominal relations, and think that this should be the future direction for UD's guidelines, but for the moment this is a sensible compromise. In your conversion, probably it is easy to convert an nsubj relation into csubj:relcl etc. if it points to a predicate.

lukatercon commented 12 months ago

In Slovenian we seem to have found a case where an orphaned element also exhibits ellipsis of the clausal head. This leads to an orphaned element attaching itself to another orphaned element and triggers the validation warning "The parent of 'orphan' should normally be 'conj' but it is 'orphan'".

The example in Slovenian is given below (with an added English equivalent. The verbs in [square brackets] are added in English to emphasize the words that are not present in the original Slovenian sentence):

Prav je, da so za tak dogodek zaprli cesto, saj če jo za vsako kolesarsko dirko, jo lahko tudi za četvorko.

It is right that they closed the road for such an event, since, if they [close] it for every bicycle race, they can also [close] it for a dance event.

Both the main clause of the second conjunct as well as its clausal dependent (the if clause, which would normally be advcl) lack a verb. Thus, we analyze this as orphan(jo-17, jo-12) and orphan(jo-12, dirko) (in English this would correspond to orphan(it-26, it-17) and orphan(it-17, race) with the obj being promoted to the role of clausal head in the former case). Here is a representation of the analyzed structure in Slovenian:

image

There is no other option than to mark the direct object as the promoted clausal head and use the orphan relation, so we believe the validation script should not produce a warning in this case. Note that lahko is a modal adverb that functions in a similar way to the auxiliary verb can in English. However, it is formally not an auxiliary and always receives the advmod dependency relation, thus it cannot function as clausal head without creating misleading dependencies.

dan-zeman commented 11 months ago

This is an interesting example, maybe we should show it somewhere in the guidelines. I agree with your analysis. That is why what the validator produces is a warning and not an error. (Warnings do not make your treebank invalid.)

It probably still makes sense to issue the warnings because cases like this are rare. And the validator can hardly know that this sentence is different from other cases where people attempt to chain two orphan relations. (Actually it might be possible to infer from enhanced dependency representation if it were present; but it is not available for Slovenian as far as I know.)

Stormur commented 11 months ago

In this specific case, since this is an elliptic adverbial clause inside a main elliptic clause, why can advcl not be used (between the two jo)? The two "orphanhoods" are each in their own clause.

dan-zeman commented 11 months ago

In this specific case, since this is an elliptic adverbial clause inside a main elliptic clause, why can advcl not be used (between the two jo)? The two "orphanhoods" are each in their own clause.

Because the guidelines say that orphan should be used. The complete (pre-ellipsis) upper clause contains one obj (jo), one obl (tudi za četvorko), and one advcl (če jo za vsako kolesarsko dirko). In the obliqueness hierarchy in the guidelines, the order of these three is obj > obl > advcl. Therefore, jo is promoted and the other two are attached to it as orphan.

Stormur commented 11 months ago

OK, I see this now.

Thanks for pointing me to this, I might have to revise some things... but at the same point, I find there is something problematic about it, but it goes beyond this topic.