clulab / reach

Reach Biomedical Information Extraction
Other
96 stars 39 forks source link

Errors during Big Run #357

Open hickst opened 7 years ago

hickst commented 7 years ago

Hitting a few internal errors during the big run. All referenced papers are available on River at: /net/kate/storage/data/nlp/corpora/pmc_openaccess/pmc_aug2016_explorer/nxml

paper: PMC3956712

error:
java.lang.UnsupportedOperationException: tail of empty list

stack trace:
scala.collection.immutable.Nil$.tail(List.scala:422)
scala.collection.immutable.Range.tail(Range.scala:223)
org.clulab.reach.darpa.DarpaActions$.consecutivePreps(DarpaActions.scala:616)
org.clulab.reach.darpa.DarpaActions$$anonfun$proteinBetween$1$$anonfun$apply$mcVI$sp$3$$anonfun$apply$3.apply$mcZI$sp(DarpaActions.scala:601)
org.clulab.reach.darpa.DarpaActions$$anonfun$proteinBetween$1$$anonfun$apply$mcVI$sp$3$$anonfun$apply$3.apply(DarpaActions.scala:598)
org.clulab.reach.darpa.DarpaActions$$anonfun$proteinBetween$1$$anonfun$apply$mcVI$sp$3$$anonfun$apply$3.apply(DarpaActions.scala:598)
scala.collection.TraversableLike$WithFilter$$anonfun$withFilter$1.apply(TraversableLike.scala:744)
scala.collection.TraversableLike$WithFilter$$anonfun$withFilter$1.apply(TraversableLike.scala:744)
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
scala.collection.immutable.List.foreach(List.scala:381)
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
org.clulab.reach.darpa.DarpaActions$$anonfun$proteinBetween$1$$anonfun$apply$mcVI$sp$3.apply(DarpaActions.scala:598)
org.clulab.reach.darpa.DarpaActions$$anonfun$proteinBetween$1$$anonfun$apply$mcVI$sp$3.apply(DarpaActions.scala:596)
scala.collection.Iterator$class.foreach(Iterator.scala:893)
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
org.clulab.reach.darpa.DarpaActions$$anonfun$proteinBetween$1.apply$mcVI$sp(DarpaActions.scala:596)
org.clulab.reach.darpa.DarpaActions$$anonfun$proteinBetween$1.apply(DarpaActions.scala:595)
org.clulab.reach.darpa.DarpaActions$$anonfun$proteinBetween$1.apply(DarpaActions.scala:595)
scala.collection.Iterator$class.foreach(Iterator.scala:893)
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
org.clulab.struct.Interval.foreach(Interval.scala:11)
org.clulab.reach.darpa.DarpaActions$.proteinBetween(DarpaActions.scala:595)
org.clulab.reach.darpa.DarpaActions$$anonfun$validArguments$1.apply(DarpaActions.scala:578)
org.clulab.reach.darpa.DarpaActions$$anonfun$validArguments$1.apply(DarpaActions.scala:577)
scala.collection.immutable.List.foreach(List.scala:381)
org.clulab.reach.darpa.DarpaActions$.validArguments(DarpaActions.scala:577)
org.clulab.reach.darpa.DarpaActions$$anonfun$keepIfValidArgs$1.apply(DarpaActions.scala:296)
org.clulab.reach.darpa.DarpaActions$$anonfun$keepIfValidArgs$1.apply(DarpaActions.scala:296)
scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
scala.collection.Iterator$class.foreach(Iterator.scala:893)
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
scala.collection.AbstractTraversable.filter(Traversable.scala:104)
org.clulab.reach.darpa.DarpaActions.keepIfValidArgs(DarpaActions.scala:296)
org.clulab.reach.darpa.DarpaActions.cleanupEvents(DarpaActions.scala:341)
org.clulab.reach.ReachSystem$$anonfun$1.apply(ReachSystem.scala:43)
org.clulab.reach.ReachSystem$$anonfun$1.apply(ReachSystem.scala:43)
org.clulab.odin.ExtractorEngine.extract$1(ExtractorEngine.scala:43)
org.clulab.odin.ExtractorEngine.loop$1(ExtractorEngine.scala:28)
org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:50)
org.clulab.odin.ExtractorEngine.extractByType(ExtractorEngine.scala:57)
org.clulab.reach.ReachSystem.extractEventsFrom(ReachSystem.scala:184)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:80)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:146)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:140)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:66)
org.clulab.reach.PaperReader$.getMentionsFromEntry(PaperReader.scala:133)
org.clulab.reach.ReachCLI.processPaper(ReachCLI.scala:104)
org.clulab.reach.ReachCLI$$anonfun$2.apply(ReachCLI.scala:53)
org.clulab.reach.ReachCLI$$anonfun$2.apply(ReachCLI.scala:49)
hickst commented 7 years ago
paper: PMC1977042

error:
java.lang.NullPointerException

stack trace:
org.clulab.reach.context.RuleBasedContextEngine.query(RuleBasedEngine.scala:126)
org.clulab.reach.context.RuleBasedContextEngine$$anonfun$assign$1.apply(RuleBasedEngine.scala:102)
org.clulab.reach.context.RuleBasedContextEngine$$anonfun$assign$1.apply(RuleBasedEngine.scala:93)
scala.collection.Iterator$class.foreach(Iterator.scala:893)
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
org.clulab.reach.context.RuleBasedContextEngine.assign(RuleBasedEngine.scala:92)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:83)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:146)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:140)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:66)
org.clulab.reach.PaperReader$.getMentionsFromEntry(PaperReader.scala:133)
org.clulab.reach.ReachCLI.processPaper(ReachCLI.scala:104)
org.clulab.reach.ReachCLI$$anonfun$2.apply(ReachCLI.scala:53)
org.clulab.reach.ReachCLI$$anonfun$2.apply(ReachCLI.scala:49)
hickst commented 7 years ago

This error looks like the previous one, but in a different paper:

paper: PMC1567125

error:
java.lang.NullPointerException

stack trace:
org.clulab.reach.context.RuleBasedContextEngine.query(RuleBasedEngine.scala:126)
org.clulab.reach.context.RuleBasedContextEngine$$anonfun$assign$1.apply(RuleBasedEngine.scala:102)
org.clulab.reach.context.RuleBasedContextEngine$$anonfun$assign$1.apply(RuleBasedEngine.scala:93)
scala.collection.Iterator$class.foreach(Iterator.scala:893)
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
org.clulab.reach.context.RuleBasedContextEngine.assign(RuleBasedEngine.scala:92)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:83)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:146)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:140)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:66)
org.clulab.reach.PaperReader$.getMentionsFromEntry(PaperReader.scala:133)
org.clulab.reach.ReachCLI.processPaper(ReachCLI.scala:104)
org.clulab.reach.ReachCLI$$anonfun$2.apply(ReachCLI.scala:53)
org.clulab.reach.ReachCLI$$anonfun$2.apply(ReachCLI.scala:49)
hickst commented 7 years ago

Update: just looked at the 7 errors so far in processing: 6 are the RulesBasedEngine NullPointer error and 1 is the tail of empty list error (example of both shown above).

enoriega commented 7 years ago

The NullPointer error is fixed now

myedibleenso commented 7 years ago

@hickst, can provide the commit hash you used for this run? I'm trying to track the first error (tail of empty list).

Thanks!

enoriega commented 7 years ago

The fix is in branch Issue375_context_engine

The commit hash is 6785d0971fb2eebae2bbd03fe01d9d1f1c1df2d8

myedibleenso commented 7 years ago

@enoriega, I'm guessing that is the hash for your fix, right? Can you put in a PR for the context engine fix, or is there still work that needs to be done?

hickst commented 7 years ago

A somewhat disturbing error occurred right at the very end of one of the BigRun batches. Disturbing because this looks like the kind of error one might get chasing a cycle in a data structure (Gus just questioned this possibility at our last weekly meeting):

[error] (run-main-0) java.lang.StackOverflowError
java.lang.StackOverflowError
        at edu.stanford.nlp.graph.DirectedMultiGraph$EdgeIterator.primeIterator(DirectedMultiGraph.java:542)
        at edu.stanford.nlp.graph.DirectedMultiGraph$EdgeIterator.primeIterator(DirectedMultiGraph.java:542)
        at edu.stanford.nlp.graph.DirectedMultiGraph$EdgeIterator.primeIterator(DirectedMultiGraph.java:542)
        at edu.stanford.nlp.graph.DirectedMultiGraph$EdgeIterator.primeIterator(DirectedMultiGraph.java:542)

.....repeat this error 1024 times.... then the JVM died, leaving 4 files unprocessed from that batch.

By comparing start messages to finish messages in the log file, I believe the culprit is PMC4370879 but since a total of five files were left unfinished (out of 60685) it could also be one of PMC4014127 PMC4045150 PMC4230053 PMC4372909