UniversalDependencies / UD_Irish-IDT

Irish data
Other
6 stars 7 forks source link

Subject and Predicate of Questions with Relative Clauses #141

Open laurenCassidy opened 3 years ago

laurenCassidy commented 3 years ago

TL;DR:

I think interrogative pronouns should be annotated as the head/predicate of questions with relative clauses and the subsequent phrase should be annotated as subject.

Full description:

I think a the head of a relative clause should never be the root of a sentence because relative clauses are subordinate clauses, which refer to an antecedent in the primary clause, Nualeargais - relative clause

According to Christian Brothers' Section 13:

Cá, cad, céard, cén are the interrogative pronouns. They contain an implicit copula and select for a relative clause, direct or indirect, as an argument.

Section 16.5 gives examples of the copula predicate subject word order such as:

cad é sin?
predicate (with implicit copula) = cad
subject = é sin

I think we should annotate questions with relative clauses the same way. e.g.

Cad a d'ith sí?
predicate(with implicit copula) = Cad
subject = a d'ith sí

root(ROOT, arán)
csubj:cleft(arán, ith)
mark:prt(ith, a)
mark:prt(ith, d')
nsubj(ith, sí)

According to Nualeargais:

The interrogative is considered then the predicate of the copular clause. The subject is now the rest of the question. e.g.

Cé hé?
predicate(with implicit copula) = cé
subject = hé

root(ROOT, Cé)
nsubj(Cé, hé)
Cén bhliain a d'éag Charles Dickens?
predicate(with implicit copula) = cén
subject = bhliain a d'éag Charles Dickens

root(ROOT, Cén)
nsubj(Cén, bhliain)
acl:relcl(bhliain, d'éag)

If a verb should be incorporated (e.g. "who says that?"), there is still the need of a real subject, which is then replaced by a relative clause with the verb.

e.g.

Cé a thug an leabhar do Mháire?
predicate(with implicit copula) = cé
subject = a thug an leabhar do Mháire

i.e

root(ROOT, Cé)
csubj:cleft(Cé, thug)
obj(thug, leabhar)
obl(thug, Mháire)

In this way the annotation of the question would match with the annotation of the answer:

Is mise a thug an leabhar do Mháire.
root(ROOT, mise)
cop(mise, is)
csubj:cleft(mise, thug)
obj(thug, leabhar)
obl(thug, Mháire)

Of course, there may be several different semantically equivalent answers to the same question. e.g.

Q1: Céard a tharlós dóibh?
AD (Declarative answer): Tarlóidh X dóibh.
AC (Cleft answer): Is X a tharlós dóibh.

Semantically, in all three answers, X is the agent which performs the verb tarlaigh.

A1 is a declarative statement and so follows the expected word order. Thus A1 more appropriately corresponds to the yes/no question:

Q2: An dtarlóidh X dóibh? 

root(ROOT, tarlóidh)
nsubj(tarlóidh, X)

AD: Tarlóidh (X dóibh).

root(ROOT, tarlóidh)
nsubj(tarlóidh, X)

In AD, syntactically, the focus/topic X has been fronted to predicate position.

Questions with relative clauses have an inherent focus i.e. the question word or missing information and so they should be annotated in the same way as cleft sentences with the focus on the same element.

Q1: Céard a tharlós dóibh?

root(ROOT, Céard)
csubj:cleft(Céard, tharlós)

AC: Is X a tharlós dóibh.

root(ROOT, X)
csubj:cleft(X, tharlós)

See also:

colinbatchelor commented 3 years ago

I have been inconsistent with this in the Scottish Gaelic treebank and this sounds like an excellent solution. Many thanks!

colinbatchelor commented 3 years ago

In Cad a d'ith sí? wouldn't it be

obj(ith, a) rather than mark:prt(ith, a) ?

tlynn747 commented 3 years ago

I believe this is already the approach for most of these:

sent_id = 2912

text = Céard a tharla do chuid eile díobh?

Céard = root

sent_id = 922

text = 'Cén diabhal útamála atá ar siúl ansin thíos?

Cén = root

sent_id = 1097

text = Cén fáth a ndeachaigh sé amach?

Cén = root

sent_id = 2247

text = Ansin céard a tharlós nuair atá tú sean?

céard = root

sent_id = 475

text = 11 Cad iad na difríochtaí idir an sábh tionúir agus an déadsábh?

Cad = root

But maybe take a look at the extensive discussion around these for v2.5 release and give your feedback on the reasoning behind some choices: https://docs.google.com/document/d/1NpBQaVwr_7Emqbx14SEQmuIxZ7Kk7U3CkWRuOkntQ0o/edit

laurenCassidy commented 3 years ago

In Cad a d'ith sí? wouldn't it be

obj(ith, a) rather than mark:prt(ith, a) ?

Good question! I will have to consult this issue first: https://github.com/UniversalDependencies/UD_Irish-IDT/issues/110 as I am unsure about when to use mark:prt vs. nsubj/obj/obl within relative clauses