Closed Akron closed 1 month ago
That is more or less in line with what the LingSIG has been pondering as an addition to the shallow grammatical description tools. It's a nice nudge for the group, too -- thanks, @luengen !
@bansp That's great - can you link to a description of this specification/proposal?
At last, here is the test file. Is it ok? SKU21.head.i5.xml.zip
Note that besides @head
and @deprel
, another attribute called @msd
is proposed which unlike @head
and @deprel
is already in the official TEI.
<s>
<w n="1" lemma="Fake" pos="N" head="2" deprel="name" msd="SUBCAT_Prop|CASECHANGE_Up|OTHER_UNK">Fake</w>
<w n="2" lemma="News" pos="N" head="3" deprel="name" msd="SUBCAT_Prop|CASECHANGE_Up|OTHER_UNK">News</w>
<w n="3" lemma="media" pos="N" head="0" deprel="ROOT" msd="NUM_Sg|CASE_Nom|CASECHANGE_Up">Media</w>
I'll give it a try. The self-referentiality of e.g.
<p xml:lang="x-|xxx:1|">
<s xml:lang="xxx">
<w deprel="ROOT" head="0" lemma="uo" msd="_" n="1" pos="Num">uo</w>
</s>
</p>
is important?
How is it self referential?
Maybe I am misreading that. What should head="0"
mean?
In a dependency tree, there is always one particular word at the root of the tree (usually the finite verb). This root word is strictly speaking not dependent on another word. In the present analyses, the root word seems to be consistently marked with head=0 and deprel=ROOT, apparently to ensure that every column has a value. I thought this was common for UD annotations in CONLL-U
Yes, but that means it's self-referential. Or it can't be indexed (i.e. found).
If I remember correctly, there are also annotations that refer to the sentence-span for root.
Yes that's how one could understand the head=0. Do you mean we should add n="0"
to every <s>
to make it explicit? That seems redundant, and is not in the source either
No - my question is really how this should be interpreted. 0
(and with it root
) could be completely ignored (simple; not queryable), 0
could refer to the same token (self-referential; simple), or 0
could refer to the embedding span (not so simple and may be not so queryable).
@bansp That's great - can you link to a description of this specification/proposal?
Nope, not yet, as far as a coherent suggestion from the group is concerned. I'll see if I can find some links to others' proposals in the meeting minutes (it's been a while since the last one).
No - my question is really how this should be interpreted.
0
(and with itroot
) could be completely ignored (simple),0
could refer to the same token (self-referential; simple), or0
could refer to the embedding span (not so simple and may be not so queryable).
But that's not a clear question :-) , there's at least one hidden assumption here -- interpreted by what, for what purpose? In the dependency grammar used here, each relation is labelled, and the one targeting "0" has the label "root". It is not a reflexive relation. "0" is like "/" in XPath, it's not a word, just an abstract node (just like the root node in an XML tree is not the root element -- it is the abstract parent of the root element). The label "root" basically says "start here". I'm guessing that your question asks how the KorAP engine should interpret this internally, for the purpose of indexing (correct?). I don't have a clear answer to that, because I know too little. :-\
Not really how it should work internally but - user-centric - how the data should be queryable. There are three options possible afaik.
Naively, I'd look towards ANNIS for guidance -- at least it's tested. And general, handling also constituent trees. Ah, or MTAS, of course. I am not sure if POLIQARP has received some new relation-oriented extensions, but employing them would mean a separate addition to KorAP anyway, I guess.
But you have dependency annotations in KorAP already, and I would assume they have a root word with head=0 in just the same way, and I would also assume that the conllu2korapxml already has a particularly treatment for these and that they can be queried in a certain way. Can't we derive the answer from there?
Unfortunately I don't know how dependency relations are currently queried in KorAP
Existing root relations coming from conllu2korapxml point to the sentence span. I am fine with that.
That seems to be a consistent way of handling this in a span-oriented setting. Why not go for that and see what happens.
yes it seems to make sense
In addition to inline token support, inline dependencies may also be supported in a limited (i.e. token-based and sentence-scoped) form. Proposal by Harald Lüngen:
Requested by Harald Lüngen.