Closed kwalcock closed 2 years ago
Specifically
[error] java.lang.ArrayIndexOutOfBoundsException: 185
[error] at org.clulab.dynet.Eisner.$anonfun$generateOutput$2(Eisner.scala:248)
[error] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
[error] at org.clulab.dynet.Eisner.generateOutput(Eisner.scala:244)
[error] at org.clulab.dynet.Eisner.ensembleParser(Eisner.scala:395)
[error] at org.clulab.processors.clu.CluProcessor.parseSentenceWithEisner(CluProcessor.scala:458)
[error] at org.clulab.processors.clu.CluProcessor.$anonfun$parse$1(CluProcessor.scala:773)
[error] at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
[error] at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
[error] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
[error] at org.clulab.processors.clu.CluProcessor.parse(CluProcessor.scala:772)
[error] at org.clulab.processors.clu.CluProcessor.$anonfun$annotate$1(CluProcessor.scala:234)
[error] at org.clulab.utils.BeforeAndAfter.perform(BeforeAndAfter.scala:10)
[error] at org.clulab.utils.BeforeAndAfter.perform$(BeforeAndAfter.scala:7)
[error] at org.clulab.processors.clu.GivenConstEmbeddingsAttachment.perform(CluProcessor.scala:1009)
[error] at org.clulab.processors.clu.CluProcessor.annotate(CluProcessor.scala:226)
[error] at org.clulab.processors.Processor.annotate(Processor.scala:65)
[error] at org.clulab.processors.Processor.annotate$(Processor.scala:62)
[error] at org.clulab.processors.clu.CluProcessor.annotate(CluProcessor.scala:222)
[error] at org.clulab.processors.examples.ProcessorExample$.main(ProcessorExample.scala:35)
[error] at org.clulab.processors.examples.ProcessorExample.main(ProcessorExample.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
In this code in Eisner.generateOutput
for(i <- scores.indices) {
val relativeHead = scores(i).maxBy(_._2)._1.toInt
val depMod = i + 1
val depHead = if (relativeHead == 0) 0 else depMod + relativeHead
if (depMod >= dependencies.length || depHead >= dependencies(depMod).length) // added
println("It is out of bounds!") // added
val label = dependencies(depMod)(depHead).label
I'm not sure there is any guarantee that the depMod + relativeHead will be within bounds. It might be that the relativeHead needs to be chosen not just with maxBy, but such that ._1 will result in a depHead value within bounds.
Thanks Keith! I'll take a look.
On Tue, Jul 5, 2022 at 22:02 Keith Alcock @.***> wrote:
In this code in Eisner.generateOutput
for(i <- scores.indices) { val relativeHead = scores(i).maxBy(_._2)._1.toInt val depMod = i + 1 val depHead = if (relativeHead == 0) 0 else depMod + relativeHead if (depMod >= dependencies.length || depHead >= dependencies(depMod).length) // added println("It is out of bounds!") // added val label = dependencies(depMod)(depHead).label
I'm not sure there is any guarantee that the depMod + relativeHead will be within bounds. It might be that the relativeHead needs to be chosen not just with maxBy, but such that ._1 will result in a depHead value within bounds.
— Reply to this email directly, view it on GitHub https://github.com/clulab/processors/issues/652#issuecomment-1175401009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TWM66T3L7IVJ6I73U3VSSBFHANCNFSM5Z5HUS7A . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I think you're right! There is a check for these bounds in the part of the code that handles the common algorithm, but not in this error handling case...
I think this in line:
val depHead = if (relativeHead == 0) 0 else depMod + relativeHead
depHead should be bound to be between [0, sentence length) Thanks!
There is code in Eisner.scala that handles the eventuality of the index being out of range:
The version in toDependencyTable
does not necessarily fill in every value in the Array
that it returns, the Array[Array[Dependency]]
. The assignment dependencies(mod)(head) = Dependency(mod, head, score, j)
is only performed conditionally. That seems to be accounted for later in ensembleParser
by val dep = startingDependencies(i)(j); if(dep != null) {
. However, it does not seem to be checked consistently, because later there is
startingDependencies(mod + 1)(head + 1).score =
lambda * startingDependencies(mod + 1)(head + 1).score + (1 - lambda) * topLabelAndScore._2
that might result in null.score
(AFAICT).
This generateOutput
is also not checking for incoming nulls, but probably should. It returns an IndexedSeq[(Int, String)]
. I think there should be no nulls there because the output goes to parserPostProcessing
where it does _._1
and _._2
without checking for null. To prevent the nulls, I need to fill in a value, even when new code like this runs:
val depHead =
if (dependencies(depMod).indices.contains(oldDepHead)) oldDepHead
else 0 // There should be a better value
0 is not a good answer, because later
val label = dependencies(depMod)(depHead).label
and the dependency there could be null (and is). What value (for head) and label should be used?
Some cyclomatic complexity calculator would flag this code as overly complex, but I'm not in a good position to change that.
Thanks @kwalcock !
On the first issue:
startingDependencies(mod + 1)(head + 1).score =
lambda * startingDependencies(mod + 1)(head + 1).score + (1 - lambda) * topLabelAndScore._2
This is fine. The parent for loop goes over labelTopScores
, which are aligned with startingDependencies
. That is, we will never "hit" a null cell in this loop.
On the second issue: yes, in the else
block ("Eisner failed to produce a complete tree; revert to the greedy inference") we do need to check for nulls. In case a null is present, we need to fill in some value. I propose to use:
0/"root" - for the if(generateRelativeHeads)
condition, and
-1/"root" - for the else
condition.
That is, all these words will be headed by the virtual root token in the sentence.
Addressed with #654.
This seems to crash: