Closed margaretha closed 7 years ago
As a non-typed relation of constituency, this will be deserialized as a positional query, so I am not sure it should be serialized as a relation at all.
Could you write a suggestion on how the positional query would be?
I think Joachim and me had the following serialization in mind:
{
"@type" : "koral:group",
"operation" : "operation:position",
"frames" : ["frames:startsWith", "frames:endsWith", "frames:isAround", "frames:matches"],
"boundary" : {
"@type" : "koral:boundary",
"min" : 0,
"max" : "3"
}
"operands" : [X,Y]
}
Although "boundary" may have a different name. It describes the minimum and maximum depth between X and Y.
I like the depth definition, but operation:sequence does not seem quite right.
Dominance looks more like a specific relation since its behaviour is very similar to a labelled pointing relation with an implicit label dominance. But it is more on the constituency/hierarchy/vertical level, while relation is not bound to any level (typically used for anaphoric/coreference relations).
Edit: operation:relation without relType should be fine. Besides, we should also handle attribute/edge type of the dominance. See #30
I already fixed the operation directly after editing. ;)
Relations need specific annotations, so I don't think it's useful here.
Oh I have to refresh the page manually. But why frames?
Edit: Do you mean koral:relation? but relType is optional. so we don't need koral:relation here. we can also make wrap in koral:relation optional and add optional attr.
Currently ["frames:contains"]
is the default for frames
in operation:position
, but as this is no real frame, but a shortcut for frames, I think we will remove it in favor of the 13 frame system, Joachim already used.
With relations I meant relations in Krill. How would you serialize dominance without a relType using operation:relation
?
Currently ["frames:contains"] is the default for frames in operation:position, but as this is no real frame, but a shortcut for frames, ...
I mean why do we need to define the frames to solve dominance?
I suppose this should be enough
{ "@type" : "koral:group", "operation" : "operation:relation", "depth" : { "@type" : "koral:boundary", "min" : 0, "max" : "3" } "operands" : [X,Y] }
No, in Krill it should not be handled using RelationSpans and there shouldn't be a problem in adjusting the deserialization, right?
Hm - I think operation:position
here would be more meaningful. otherwise it's an unlabeled relation with a "depth" called boundary ... This could also be a relation in a dependency tree.
otherwise we have to specify koral:relation as my first suggestion. The relation is indeed implicit.
For operation:position
, it should have
I think, as you proposed, having a koral:boundary
attribute with the name depth
should be enough, don't you think?
I mean an attribute/edge-type like >[func="sbj"]. Maybe it depends on the corpus data but it exists in AQL.
If a label exists, we can use operation:relation
of course, but an undefined relation seems to me a bit weird. In case it's a hierarchical relation, I prefer operation:position
.
The operator >
is always dominance, other relations are defined with the operator ->
so >[func="sbj"]
is also a hierarchical relation. I am fine with operation:position
but I think >
and >[func="sbj"]
should not be serialized into two different koral operations when they are meant to be identical operations in AQL, of course the latter with an addition of a type/attribute.
Ah - I see. Okay - yes, that makes sense. You are right - we shouldn't have different serializations for similar constructs.
So both of them with operation:position
or operation:relation
?
Either way, we need an additional attr like in relType for operation:relation. I am not sure where we can add it for operation:position.
I think the many frames definition
"frames" : ["frames:startsWith", "frames:endsWith", "frames:isAround", "frames:matches"],
for hierarchical position does not really make sense. Firstly, it is not in the Annis query itself. Secondly, the frames are supposed to define the position of the two operands, which is vertical/hirarchical, not startswith etc. Thirdly, they are not really used for solving the query, are they?
operation:relation
, although I dislike that "hierarchical dominance" is somehow implicite then.
And: Yes, they are used for solving the query.
How about defining a new operation? Using operation:relation
is actually our own interpretation since the "dominance" syntax is similar to a relation.
So you would vote for operation:dominance
or operation:hierarchy
?
sry for my late reply. I have just found your answer and didn't receive emails regarding these comments.
hmm, operation:hierarchy
sounds more general than operation:dominance
. Dominance very much refers to Annis.
That means you would prefer hierarchy
? We then may be able to reuse this for, e.g. CSS selector queries, in case we want to support that.
yes, it's good if we can reuse it!
Dominance queries will now be serialized with operation:hierarchy. For example, the query
node > cnx/c="np"
is serialized as follows:
{ "@context": "http://korap.ids-mannheim.de/ns/koral/0.3/context.jsonld", "query": { "operation": "operation:hierarchy", "operands": [ {"@type": "koral:span"}, { "@type": "koral:span", "layer": "c", "foundry": "cnx", "match": "match:eq", "key": "np" } ], "@type": "koral:group" } }
Dominance query with boundary: node & node & #1 >2,4 #2
{ "@context": "http://korap.ids-mannheim.de/ns/koral/0.3/context.jsonld", "query": { "operation": "operation:hierarchy", "operands": [ {"@type": "koral:span"}, {"@type": "koral:span"} ], "@type": "koral:group", "boundary": { "min": 2, "max": 4, "@type": "koral:boundary" } } }
Dominance query with label: "Mann" & node & #2 >[func="SBJ"] #1
I am not quite sure with the serialization for this query. Annis func="SBJ" is the same as const:func="SBJ". I think it means that func is a constituent layer, so I would serialized it:
{ "@context": "http://korap.ids-mannheim.de/ns/koral/0.3/context.jsonld", "query": { "operation": "operation:hierarchy", "operands": [ {"@type": "koral:span"}, { "wrap": { "@type": "koral:term", "layer": "orth", "match": "match:eq", "key": "Mann" }, "@type": "koral:token" } ], "@type": "koral:group", "label": { "@type": "koral:term", "layer": "c", "match": "match:eq", "key": "SBJ" } } }
Joachim noted that "c"-layer term (consituency relation/dominance) is needed, so the label would be a termgroup:
"label": { "operands": [ { "@type": "koral:term", "layer": "func", "match": "match:eq", "key": "SB" }, { "@type": "koral:term", "layer": "c" } ], "@type": "koral:termGroup", "relation": "relation:and" }
What do you think?
I think the first serializations look fine - and when ignoring the empty spans, they are deserializable in Krill. Some comments:
Can dominance span multiple foundries/layers, or should the undefined nodes have the same foundry and layer as the other operands?
I am not sure, but Annis doc section 4.6 suggests it can be "any" node or annotation, so I think multiple foundries/layers are ok.
In the third example there is a dominance relation with a surface term - what does that mean?
Why not? A node can be anything such as text, can't it?
- I think "relType" and "label" are quite similar, so I guess we should merge their usage.
Ok good idea!
- Regarding the last serialization: Is "func" really a layer? That looks weird ...
well, according to the ANNIS doc, "func" is an annotation and I believe it is supposed to be a functional dependency annotation. In the example const:func, const is a namespace and means constituent, but I don't think it has to be constituent. So the namespace allows for multiple layers. I don't really get why "c"-layer term is needed as Joachim suggested.
Btw, namespaces are not handled yet. Should we support this?
Hm - as they only show examples with single trees, I would say no. I don't even know how this could be handled.
Ah, right. But the addition of foundry and layer in the case of arbitrary nodes, should be done in Kustvakt rewriting, right?
I don't say it's not possible, but I guess the term needs to be part of the tree (in that case a leaf node). So it needs to be indexed as part of the tree structure to have a depth (and be a span) - meaning it needs the same foundry/layer. Or am I missing something? Otherwise this would mean, there is a meaningful alignment of different hirerarchies in the document's annotation and I don't know how this is possible. Though: For leaf nodes I accept that a translation to the relevant foundry/layer can be implicite.
That it needs the same foundry/layer makes sense. But should this be restricted in Koral as a matter of syntay?
I would say,
const
is the layer,func
is the key,SB
is the value, though the translation would not match our index at all in Krill.
Oh that's cool. Hm that depends on the annotation data. Do we have any hierarchy annotations in our data at all? How about we support this format: [foundry/layer:key=value] ?
Btw, there is another parameter. Dominance edge may be specified into some type, like "secondary edge" when a node has more than one child node, e.g. >secedge[foundry/layer:key=value] In this case, it seems like the ranking of the edges (primary, secondary) are explicitly annotated. Would such type exist in our annotations?
Can dominance span multiple foundries/layers, or should the undefined nodes have the same foundry and layer as the other operands?
There was at least an example with 2 different layers: constituent and POS.
cat="NP" > pos="RB"
where RB is an adverb.
ps: sry, I've just realized that I answered directly in your last comment.
Hi Eliza, a quick note: secondary edges are not for additional children. They are a legacy concept from the early Tiger XML datamodel, where primary edges indicated either constituency or dependency in the relational-grammar sense, while secondary edges expressed non-local relations, such as co-reference (and probably also "movement" in some grammatical models). In later models, there was a unified concept of edge with different labels (and maybe with separately indicated different functions, but I can't recall that clearly now).
nd> Do we have any hierarchy annotations in our data at all?
I think that at least XIP annotations indicated both hierarchy and dependency (dependencies were sometimes defined not over terminals but over a terminal vs. a phrase).
Hi Piotr, thanks for the clarification about secondary edges! I cannot really imagine what secondary edges would be in the hierarchical sense though.
Annis has separate operators for dominance (hierarchy) and relations (e.g dependency & co-reference).
em> I cannot really imagine what secondary edges would be in the hierarchical sense though.
They weren't used for hierarchies. They were "secondary" exactly because they violated the basic ("primary"?) Tiger tree model based on dominance. So, for example, a secondary edge could link "him" to "her boyfriend" in a sentence "Her boyfriend is overall a nice guy, but I still don't like him", where there is definitely no hierarchical relationship between the two nodes in question.
They weren't used for hierarchies. They were "secondary" exactly because they violated the basic ("primary"?) Tiger tree model based on dominance.
then it is strange to have such a type in Annis dominance while it also has pointing relation operator that is more appropriate for the secondary edges. Do you think it is specifically added for supporting data using this early Tiger XML datamodel ?
Oh gosh, I have completely no idea and can't investigate this at the moment (still not done with my contribution to tomorrow's evaluation ;-)). But I know a perfect person to ask, if he has the time and can be bothered: @amir-zeldes (whom I can't apparently reference from here because of some formal reasons :-/ )
Hi @bansp, the reference seems to work fine! I got this message anyway. I'll try to answer below:
Secedges in the Tiger corpus expressed forms of structure sharing that violated the unique parent assumption, for example right node raising, gapping, etc. As such, they are considered to be proper dominance relations (they imply inherited coverage, unlike pointing relations). You can see an ANNIS example from the Potsdam Commentary Corpus here (open the constituents view): cat="S" >secedge tok="was"
However, the device of edge typing is more general than that in ANNIS, which is based on Salt. In practice it usually follows the modeling strategies used by PAULA XML so maybe looking at those is a better way to understand things (see the PAULA documentation). In a nutshell, ANNIS edges have:
>
and the named operator >edgetype
. >*
) or a specific type (>rst*
). For this reason, dom edges must be cycle-free across all minor types. Pointing relations may cycle across different minor types, but for this reason, only typed indirect queries are supported (->coref*
). The need to query either 'edge' or 'secedge' (e.g. cat="NP" > cat="PP", over whatever edge), is one of the reasons why secedges are dominance relations.I hope that answers the question and gives an idea of the data model - if anything is unclear just let me know.
Hi @amir-zeldes, thank you for your explanation! I am not sure how multiple annotations are separated or if they have boolean operations in an edge label since I cannot find it in the Annis 3 documentation. Could you please give an example?
You're right, AQL doesn't provide any special syntax for boolean operations on edge annotations, and TBH I think we also don't really have an example corpus containing multiple edge annotations. That said, you can get that kind of behavior with a more complex query. If you want boolean AND, you can simply repeat the relation declaration. There is no reflexivity constraint on the edge connection, so this will match AND:
x & y & #1 ->rel[anno1="val"] #2 & #1 ->rel[anno2="val"] #2
It's not exactly elegant, but it should work. For OR you can use the general disjunction using |
. Either full query, or in more recent ANNIS versions also relation disjunction:
Relation disjunction:
tok & tok & (#1 ->dep[func="nsubj"] #2 | #1 ->dep[func="nsubjpass"] #2)
Full query disjunction:
tok & tok & #1 ->dep[func="nsubj"] #2 | tok & tok & #1 ->dep[func="nsubjpass"] #2
nd> Do we have any hierarchy annotations in our data at all?
@bansp: Yes, we have constituency annotations from CoreNLP and dependency annotations from MALT.
Dominance is serialized as a relation with the layer c and without a key, that is not a valid koral:term object.
Suggestion:
node & node & #2 > #1
node & node & #2 ->dominance #1
could be serialized identically as a relation with key dominance.