UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
269 stars 245 forks source link

Syntactic vs. textual relations #936

Open nschneid opened 1 year ago

nschneid commented 1 year ago

I want to explore whether UD guidelines should articulate a distinction between

The second case covers conventions of textual organization that may involve document or discourse structure—rather than syntactic head-dependent structure—packed into one tree.

The clearest case of this is list, which is described as pertaining to juxtaposed metadata elements and/or units smaller than sentences formatted as lists.

"If the fields in the list are explicit and have a key-value structure, the key-value pair relations are labeled as appos." Note that appos mainly exists for appositions within a syntactic sentence, but it seems this key-value pairing is an extended technical use of appos—it is not clear that the key-value pair need to both be nominals or reversible to qualify for appos, though those criteria would apply for normal appositions.

Here is an example:

image

Another example given for list arguably overlaps with parataxis, specifically the case of side-by-side sentences (or sub-sentence units). "Another place where list has been used is for a sequence of attributes or descriptive terms used as the title line of a review (such as product or restaurant reviews, etc.):" e.g.

Long Lines, Silly Rules, Rude Staff, Ok Food

One possible way to interpret the distinction from parataxis is that list applies to that example only because it is a metadata field, rather than a part of the document that would be expected to be grammatically coherent.

This larger question of a syntactic vs. textual distinction bears on #933 (certain uses of colons). For example, in a list of students we might have key-value pairs where the key and value are not necessarily nominals:

Ann: junior Kim: graduated 2018 Sam: unknown

One might be tempted to treat these as abridged versions of full sentences ("Ann is a junior", "Kim graduated in 2018"), but not all colon-separated pairs have a single obvious expansion. A more conservative option is to choose a label saying simply that these are paired phrases, treating the interpretation as a matter of pragmatic/stylistic conventions, not grammar. appos as generic key-value relator could apply here, or perhaps parataxis or orphan. Likewise, given the genre, the omission of "in" before "2018" can be interpreted not as an accident or ellipsis, but as an indication that "graduated 2018" is not a real syntactic phrase, and should be linked with parataxis.

So, to frame the general issue: Do we take some relations to fall beyond syntax proper, deserving miscellaneous deprels? Or should we strive to exact maximum mileage from our syntactic deprels, even if that means interpreting some stylistic textual conventions as abridged forms of standard syntactic constructions?

sylvainkahane commented 1 year ago

I agree with you that some usages of our written system are not the language itself, in the sense that it would be difficult to utter such lists of key-value in a normal conversation. For instance, if I look to an HPSG feature structure, I can read it, all the elements are English words, but if I utter it, it will be incomprehensible. This belongs to another system which is not exactly natural language (it is diagrammatic and the relative position of elements is fundamental). If we consider that this construction are not part of the natural language, we must introduce a special relation for that. I don't think that appos is appropriate, because standard appos connects conjuncts with the same referent: Bill, Mary's brother or Paris, the French capital. We already have list for connecting the key-value phrases together, we need a relation for connecting the key and the value.

amir-zeldes commented 1 year ago

I think if the two parts do not form an apposition, then there is still no need for a new relation - if there is an obvious transparent relation around somethings like a colon (@sylvainkahane 's example, "Peter is in charge of Monday, Raphael : Tuesday") then I think we should use it (in that example, orphan, since we would use that without the colon), and otherwise if they do not form a constituent and we can't identify the relation, then parataxis should work fine IMO. It is the intended relation for things that just stand next to each other, so why not use it?

sylvainkahane commented 1 year ago

In spoken corpora, afaik, at least one of the two units linked by parataxis is a clause (or the nucleus of an illocutionary unit to speak in pragmatic terms). It is the case of course when parataxis links two juxtaposed clauses, but also when it it links a tag question or an interjected clause to the main clause or when a parenthesis is inserted in the main clause. Maybe in written text, parataxis is used in cases where none of two units is a clause, but I think in such cases another relation should be used. In the examples we are discussing here, we have two phrases forming together a kind of clause and clearly none of two units form a clause by itself. It is why I really don't like the idea of using parataxis here.

nschneid commented 1 year ago

I almost wonder if for UDv3 we should propose a new juxt relation for syntactically loose subclausal juxtapositions, to distinguish them from canonical cases of parataxis.

(And potentially, orphan could become juxt:orphan for the special case where a juxtaposition arises from a predicate ellipsis construction.)

amir-zeldes commented 1 year ago

I almost wonder if for UDv3 we should propose a new juxt relation

I think UD has enough relations already (in fact, I would be for removing list in v3 and just using parataxis for all of these)

In spoken corpora, afaik, at least one of the two units linked by parataxis is a clause

I see your point, but if parataxis is also the canonical relation for joining two independent sentences that are merged for some reason (incl. typographical), then we will inevitably have cases where one of the two 'sentences' is a fragment, so parataxis would also join nominals. From a traditional perspective, I think parataxis just means putting to things next to each other - in some terminologies, even explicit nominal coordination is considered a paratactic relation, even though we have the more specific conj for that. I for one am OK with having parataxis between nominals when we don't want to say that they are coordinate (i.e. they do not form an NP together, and one is not a predication of the other).

nschneid commented 1 year ago

Actually now I'm thinking along the lines of renaming list to juxt and moving some of the current cases of parataxis as that is overloaded. But this is clearly a long-term discussion.

amir-zeldes commented 1 year ago

Agreed that parataxis is overloaded (at least the parenthetical use is something quite different from two things standing next to each other)

bulbulistan commented 1 year ago

The one problem with parataxis - and also conj - is that they cover both clause and non-clause dependencies. Elswhere, we maintain a strict distinction - amod vs acl, advmod vs advcl etc. - but not here. Is there a specific reason we don't do so here?