Closed vanatteveldt closed 8 years ago
About the two problems:
1) The problem of empty paths from a certain term to the root: the main reason is the tokeniser, which is not splitting properly the sentences. The dependency parser (Alpino) runs on these malformed sentences and generates quite a lot of non-sense dependencies, which do not correspond to a valid dependency tree. For instance in the example input, all these tokens go to the same sentence
<wf id="w1" length="6" offset="0" para="1" sent="1">PostNL</wf>
<wf id="w2" length="4" offset="7" para="1" sent="1">gaat</wf>
<wf id="w3" length="4" offset="12" para="1" sent="1">naar</wf>
<wf id="w4" length="4" offset="17" para="1" sent="1">3000</wf>
<wf id="w5" length="12" offset="22" para="1" sent="1">afhaalpunten</wf>
<wf id="w6" length="4" offset="36" para="1" sent="1">door</wf>
<wf id="w7" length="5" offset="41" para="1" sent="1">Wilko</wf>
<wf id="w8" length="8" offset="47" para="1" sent="1">Voordouw</wf>
<wf id="w9" length="9" offset="57" para="2" sent="1">AMSTERDAM</wf>
<wf id="w10" length="1" offset="67" para="2" sent="1">-</wf>
<wf id="w11" length="2" offset="71" para="2" sent="1">Op</wf>
<wf id="w12" length="5" offset="74" para="2" sent="1">korte</wf>
<wf id="w13" length="7" offset="80" para="2" sent="1">termijn</wf>
<wf id="w14" length="4" offset="88" para="2" sent="1">moet</wf>
<wf id="w15" length="3" offset="93" para="2" sent="1">het</wf>
<wf id="w16" length="6" offset="97" para="2" sent="1">aantal</wf>
<wf id="w17" length="19" offset="104" para="2" sent="1">PostNL-afhaalpunten</wf>
<wf id="w18" length="4" offset="124" para="2" sent="1">voor</wf>
<wf id="w19" length="9" offset="129" para="2" sent="1">pakketjes</wf>
<wf id="w20" length="3" offset="139" para="2" sent="1">van</wf>
<wf id="w21" length="4" offset="143" para="2" sent="1">ruim</wf>
<wf id="w22" length="4" offset="148" para="2" sent="1">2000</wf>
<wf id="w23" length="4" offset="153" para="2" sent="1">naar</wf>
<wf id="w24" length="4" offset="158" para="2" sent="1">3000</wf>
<wf id="w25" length="1" offset="162" para="2" sent="1">.</wf>
I included some statements to getting and exception in these case, but the “real” problem is there.
2) About the key error with t_841.
This was related to the way that the module tries to select which is the root node for a certain sentence. Basically it was based on 2 heuristics:
1) Select nodes with the smallest number of dependency relations arriving TO them 2) Select nodes with the biggest number of dependency relations starting FROM them
In some case there could be a tie between several nodes, for instance in the example t_841, we have these dependencies:
<!--whd/body(Hoe,zit)-->
<dep from="t_838" rfunc="whd/body" to="t_839"/>
<!--hd/predc(zit,Hoe)-->
<dep from="t_839" rfunc="hd/predc" to="t_838"/>
<!--hd/su(zit,dat)-->
<dep from="t_839" rfunc="hd/su" to="t_840"/>
<!--- - / - -(Hoe,?)-->
<dep from="t_838" rfunc="-- / --" to="t_841"/>
<!--dp/dp('In,zoektocht)-->
So the candidates with the 2 heuristics were 2:
I included a third heuristic which solves this tie by selecting as root the term which is tagged as a verb, in this case t_839 (ZITTEN).
Ruben Izquierdo Bevia Vrije University of Amsterdam ruben.izquierdobevia@vu.nlmailto:ruben.izquierdobevia@vu.nl http://rubenizquierdobevia.com/
On 18 May 2016, at 23:56, Wouter van Atteveldt notifications@github.com<mailto:notifications@github.com> wrote:
I seem to get this error occasionally when running the multilingual factuality.
Traceback (most recent call last):
File "/data/wva/newsreader_pipe_nl/modules/multilingual_factuality/feature_extractor/rule_based_factuality.py", line 413, in
An example input file that causes the error can be found here: http://i.amcat.nl/keyerror.naf
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/cltl/multilingual_factuality/issues/4
Wasn't the tokenizer problem solved in https://github.com/cltl/morphosyntactic_parser_nl/pull/7 ? I was fairly confident that I updated the parser before running, but I'll try again.
(btw, you can probably close this if you fixed problem 2, as problem 1 is really issue #3?)
I seem to get this error occasionally when running the multilingual factuality.
An example input file that causes the error can be found here: http://i.amcat.nl/keyerror.naf