clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

NxmLReader problem probably associated to assembly #718

Closed enoriega closed 3 years ago

enoriega commented 3 years ago

When I turn on assembly in the config, sometimes I see an an NxmlReader error that I didn't see otherwise. The stack trace points back to some of the assembly methods. I suspect there is an unexpected corner case with this class of files that crashes their processing.

The error is not catastrophic, as processing of the other files carries on.

I attach the stack trace of the exception and an nxml file that triggers it for replication purposes.

PMC6797981.nxml.txt

Stack trace:

 ¡¡¡ NxmlReader error !!!                                                      

paper: PMC6797981

error:                                                                         
java.lang.reflect.InvocationTargetException

stack trace:                                                                   
sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror2.jinvokeraw(JavaMirrors.scala:398)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaMethodMirror.jinvoke(JavaMirrors.scala:354)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror.apply(JavaMirrors.scala:370)
org.clulab.odin.impl.ActionMirror.$anonfun$reflect$1(ActionMirror.scala:23)
org.clulab.odin.impl.GraphExtractor.findAllIn(Extractor.scala:111)
org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1(Extractor.scala:20)
org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1$adapted(Extractor.scala:19) 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
scala.collection.immutable.Range.foreach(Range.scala:158)
scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
org.clulab.odin.impl.Extractor.findAllIn(Extractor.scala:19)
org.clulab.odin.impl.Extractor.findAllIn$(Extractor.scala:18)
org.clulab.odin.impl.GraphExtractor.findAllIn(Extractor.scala:99)
org.clulab.odin.ExtractorEngine.$anonfun$extractFrom$2(ExtractorEngine.scala:45)
scala.collection.TraversableLike$WithFilter.$anonfun$flatMap$2(TraversableLike.scala:773)
scala.collection.Iterator.foreach(Iterator.scala:941)
scala.collection.Iterator.foreach$(Iterator.scala:941)
scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
scala.collection.IterableLike.foreach(IterableLike.scala:74)
scala.collection.IterableLike.foreach$(IterableLike.scala:73)
scala.collection.AbstractIterable.foreach(Iterable.scala:56)
scala.collection.TraversableLike$WithFilter.flatMap(TraversableLike.scala:772)
org.clulab.odin.ExtractorEngine.extract$1(ExtractorEngine.scala:43)
org.clulab.odin.ExtractorEngine.loop$1(ExtractorEngine.scala:34)
org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:56)
org.clulab.odin.ExtractorEngine.extractByType(ExtractorEngine.scala:63)
org.clulab.reach.ReachSystem.extractEventsFrom(ReachSystem.scala:213)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:89)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:155)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:149)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:73)
org.clulab.reach.PaperReader$.getMentionsFromEntry(PaperReader.scala:144)
org.clulab.reach.ReachCLI.processPaper(ReachCLI.scala:136)
org.clulab.reach.ReachCLI.$anonfun$processPapers$3(ReachCLI.scala:90)
org.clulab.reach.ReachCLI.$anonfun$processPapers$3$adapted(ReachCLI.scala:84)
scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1056)
scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1053)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:170)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
kwalcock commented 3 years ago

This stack trace is deceptive, probably because of the reflection involved. It is hiding an array index out of bounds exception which is thrown from LinguisticPolarityEngine line 46. At that point there is a sentence with a token interval [3, 29] but incoming edges only from [0, 28].

      val prepc_byed = (evt.tokenInterval filter (tok => deps.getIncomingEdges(tok).map(_._2).contains("advcl_by"))).toSet
kwalcock commented 3 years ago

This may well have been fixed with https://github.com/clulab/processors/pull/428 which should be in processors 8.2.2, but that's exactly what is supposed to be being used...

MihaiSurdeanu commented 3 years ago

Indeed. This should be fixed there... @enoriega: if you can isolate this a sentence, I will debug this.

kwalcock commented 3 years ago

I know which sentence it is. Will report soon.

On Tue, Jan 5, 2021, 7:32 AM Mihai Surdeanu notifications@github.com wrote:

Indeed. This should be fixed there... @enoriega https://github.com/enoriega: if you can isolate this a sentence, I will debug this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/clulab/reach/issues/718#issuecomment-754670696, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCHCOUO5XYJHPNYD3XE54DSYMPGLANCNFSM4VRKD53A .

kwalcock commented 3 years ago

It is this sentence, which is almost the only one in the attached file:

This pleiotropic inflammatory cytokine is produced by T cells, monocytes, macrophages and synovial fibroblasts, and mediates various functions by binding to its receptor IL-6R ( 40 ).

AFAICT, the problem is in the processors project, file CoreNLPProcessor.scala, around line 129-130, in which these lines do not pass in a preferredSize when they call CoreNLPUtils.toDirectedGraph, unlike the code in FastNLPProcessor, method parseWithStanford, which might be used as a template.

doc.sentences(offset).setDependencies(GraphMap.UNIVERSAL_BASIC, CoreNLPUtils.toDirectedGraph(basicDeps, in))
doc.sentences(offset).setDependencies(GraphMap.UNIVERSAL_ENHANCED, CoreNLPUtils.toDirectedGraph(enhancedDeps, in))

Sentence110.nxml.txt

MihaiSurdeanu commented 3 years ago

Thanks! I'll take a look soon.

kwalcock commented 3 years ago

@enoriega's keen observational skills are greatly appreciated. Thanks for taking the time to report the problem.

MihaiSurdeanu commented 3 years ago

Solved in processors PR #439, which is in the process of being tested and then merged.

@enoriega: this means that you have to publishLocal processors 8.2.4-SNAPSHOT, and use this version in reach/processors/build.sbt.

MihaiSurdeanu commented 3 years ago

The processors PR has been merged.

kwalcock commented 3 years ago

This has likely been handled with https://github.com/clulab/processors/pull/439.

enoriega commented 3 years ago

I tested again with a freshly cloned processors version 8.2.4-SNAPSHOT and am still getting the same error trace. Could it be that there's still a corner case not covered in processors?

PMC2669449.nxml.txt

 ¡¡¡ NxmlReader error !!!

paper: PMC2669449

error:
java.lang.reflect.InvocationTargetException

stack trace:
sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror2.jinvokeraw(JavaMirrors.scala:398)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaMethodMirror.jinvoke(JavaMirrors.scala:354)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror.apply(JavaMirrors.scala:370)
org.clulab.odin.impl.ActionMirror.$anonfun$reflect$1(ActionMirror.scala:23)
org.clulab.odin.impl.GraphExtractor.findAllIn(Extractor.scala:111)
org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1(Extractor.scala:20)
org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1$adapted(Extractor.scala:19)
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
scala.collection.immutable.Range.foreach(Range.scala:158)
scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
org.clulab.odin.impl.Extractor.findAllIn(Extractor.scala:19)
org.clulab.odin.impl.Extractor.findAllIn$(Extractor.scala:18)
org.clulab.odin.impl.GraphExtractor.findAllIn(Extractor.scala:99)
org.clulab.odin.ExtractorEngine.$anonfun$extractFrom$2(ExtractorEngine.scala:45)
scala.collection.TraversableLike$WithFilter.$anonfun$flatMap$2(TraversableLike.scala:773)
scala.collection.Iterator.foreach(Iterator.scala:941)
scala.collection.Iterator.foreach$(Iterator.scala:941)
scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
scala.collection.IterableLike.foreach(IterableLike.scala:74)
scala.collection.IterableLike.foreach$(IterableLike.scala:73)
scala.collection.AbstractIterable.foreach(Iterable.scala:56)
scala.collection.TraversableLike$WithFilter.flatMap(TraversableLike.scala:772)
org.clulab.odin.ExtractorEngine.extract$1(ExtractorEngine.scala:43)
org.clulab.odin.ExtractorEngine.loop$1(ExtractorEngine.scala:34)
org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:56)
org.clulab.reach.assembly.sieves.SieveUtils$.$anonfun$assemblyViaRules$4(Sieves.scala:540)
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
scala.collection.immutable.Map$Map1.foreach(Map.scala:128)
scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
org.clulab.reach.assembly.sieves.SieveUtils$.assemblyViaRules(Sieves.scala:534)
org.clulab.reach.assembly.sieves.PrecedenceSieves.applyPrecedenceRules(Sieves.scala:63)
org.clulab.reach.assembly.sieves.PrecedenceSieves.intrasententialRBPrecedence(Sieves.scala:103)
org.clulab.reach.assembly.Assembler$.$anonfun$applySieves$2(Assembler.scala:146)
org.clulab.reach.assembly.sieves.AssemblySieve$$anon$1.apply(AssemblySieve.scala:32)
org.clulab.reach.assembly.sieves.SieveMixture.apply(AssemblySieve.scala:38)
org.clulab.reach.assembly.sieves.SieveMixture.apply(AssemblySieve.scala:43)
org.clulab.reach.assembly.Assembler$.applySieves(Assembler.scala:158)
org.clulab.reach.assembly.Assembler$.apply(Assembler.scala:116)
org.clulab.reach.ReachCLI.doAssembly(ReachCLI.scala:124)
org.clulab.reach.ReachCLI.outputMentions(ReachCLI.scala:184)
org.clulab.reach.ReachCLI.$anonfun$processPaper$1(ReachCLI.scala:143)
org.clulab.reach.ReachCLI.$anonfun$processPaper$1$adapted(ReachCLI.scala:143)
scala.collection.immutable.List.foreach(List.scala:392)
org.clulab.reach.ReachCLI.processPaper(ReachCLI.scala:143)
org.clulab.reach.ReachCLI.$anonfun$processPapers$3(ReachCLI.scala:90)
org.clulab.reach.ReachCLI.$anonfun$processPapers$3$adapted(ReachCLI.scala:84)
scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1056)
scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1053)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:160)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

==========
kwalcock commented 3 years ago

I'll check.

kwalcock commented 3 years ago

It looks like the same error, but has a completely different cause. This line in reach does not check the bounds correctly:

case outOfBounds if outOfBounds == -1 || outOfBounds > words.size => false
MihaiSurdeanu commented 3 years ago

Can you please try to fix it? Thank you!

kwalcock commented 3 years ago

Yes, doing so.

kwalcock commented 3 years ago

@enoriega, while it is being tested, approved, and merged, you can use the changes from #719 or the kwalcock-fixes branch.

enoriega commented 3 years ago

Thanks @kwalcock

enoriega commented 3 years ago

I have been running REACH using branch kwalcock-fixes for a while and haven't seen this error. I think it is safe it has been fixed now. Thanks @kwalcock

kwalcock commented 3 years ago

Thanks for the update. We're working on the merge to master and a new release.