Open maxxkia opened 8 years ago
Please also add an info about where you got the new tiger sample from to the NOTICE.txt file.
How should we deal neighbouring tokens?
for example in
w1 w2 w3 w4 w5 w6 w7
if a target is made up of w2 w3
which one is the correct assignment of begin and end for the first element?
a)
begin = w2.begin
end = w3.end
b)
begin = w2.begin
end = w2.end
@reckart any suggestions?
I think for continuous spans, we can just extend the offsets. I think the only problem are non-continous spans, because we presently do not have a concept to represent these in DKPro Core.
@reckart The problem that I imagined to be discontinuous frame arguments (#895) turned out to be another issue.
Having the following example:
<frame name="SubjectiveExpression" id="s6_f2">
<target>
<fenode idref="s6_3"/>
<fenode idref="s6_2"/>
</target>
<fe name="Source" id="s6_f2_e1">
<flag name="Sprecher">
</flag>
</fe>
<fe name="Target" id="s6_f2_e2">
<fenode idref="s6_4"/>
<fenode idref="s6_503"/>
<fenode idref="s6_5"/>
</fe>
</frame>
, when the reader processes the frame target (id="s6_f2_e2") it creates 3 instances of SemArgLink
having the role set to Target
and each linking to an instance of SemArg
representing the annotation covered by each of fenode
s (i.e. s6_4, s6_503 and s6_5).
These SemArgLink
s are accessible as arguments of a SemPred:
FSArray arguments = element.getArguments();
However since instances of SemArgLink
belonging to a single argument are not stored in a unique collection, one has to iterate over all of them to identify the SemArgLink
group. One solution to this would be to iterate over them and group them based on their frame name (i.e. Target
in this case), whose value I'm not sure to be distinct (can there be two FE in a TigerXml file having the same name but different ids?). Also note that the frame id (i.e. s6_f2_e2
), which can be used to uniquely identify arguments, is dropped in TigerXmlReader.
This problem was raised when I tried to identify the boundaries of sources and targets for subjective expressions.
can there be two fe in a TigerXml file having the same name but different ids?
In principle, yes. There could be two arguments with the same role name.
<fe name="Target" id="s6_f2_e2">
<fenode idref="s6_4"/>
<fenode idref="s6_503"/>
<fenode idref="s6_5"/>
</fe>
I think that if these three are adjacent tokens, they should be merged into a single SemArg span. So if they are not adjacent tokens, then we have discontinuous SemArg. Does that make sense to you?
Only one SemArgLink
& SemArg
should IMHO be per FE.
Actually in this example and in many more examples I checked manually the constituents of a FE are adjacent and they can be merged. I should write a piece of code to see if there exists any discontinuous FE in my dataset.
Only one SemArgLink & SemArg should IMHO be per FE.
I agree, since I haven't yet seen any example violating this condition.
TigerXmlReader produces wrong begin and end index for target (SemPred) of a semantic frame when the target is noncontiguous.
For instance in the following sentence:
w1 w2 w3 w4 w5 w6 w7
, if a target consists of w2 and w5 then the corresponding begin and end indexes for target will be wrongly set as:
To fix this issue:
w2
in this case)