Open herongrove opened 6 years ago
Yuck. How many papers (out of total) have processed? Does this occur near the end of the papers? (do wc -l
on output/restart.log
to see how many successfully processed).
The first time, it occurred after about 50k of 100k papers. The second time was about 80k, and then the third 87k.
OK, thanks. And how many papers failed in this manner each time?
I didn't have the foresight to print the logs to files, but from what didn't get erased in my tmux window, it looks like more than 5.
These kinds of distributed problems are really hard to debug. I'm gonna, at least, need to see the log file (look for reach.log
). It seems like there may be a uncaught error that happens very sporadically, then the system recovers and continues. Note, that after the original error, the remoting library imposes a 5-second ban on client connections, which could cause other secondary errors. If we can get the log file, we can see if all the errors are the same or not (and we can hope for more information).
Dane retrieved the log file and I notice that the very first error is a CMU-exporter error (i.e. not related to Akka) and that it seems likely (IMHO) to have been caused by a bad data structure. The log shows several other similar types of problems.
17:51:42.902 [ForkJoinPool-1-worker-943] ERROR org.clulab.reach.ReachCLI -
==========
¡¡¡ NxmlReader error !!!
paper: PMC1877819
error:
java.util.NoSuchElementException: next on empty iterator
stack trace:
scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
scala.collection.IterableLike$class.head(IterableLike.scala:107)
scala.collection.AbstractIterable.head(Iterable.scala:54)
org.clulab.reach.export.cmu.CMUExporter.createMechanismType(CMUExporter.scala:25)
org.clulab.reach.export.cmu.CMUExporter$$anonfun$1.apply(CMUExporter.scala:106)
org.clulab.reach.export.cmu.CMUExporter$$anonfun$1.apply(CMUExporter.scala:96)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:316)
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972)
scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972)
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
scala.collection.SetLike$class.map(SetLike.scala:92)
scala.collection.AbstractSet.map(Set.scala:47)
org.clulab.reach.export.cmu.CMUExporter.getRows(CMUExporter.scala:96)
org.clulab.reach.assembly.export.AssemblyExporter.rowsToString(AssemblyExporter.scala:209)
org.clulab.reach.export.cmu.CMUExporter$.tabularOutput(CMUExporter.scala:206)
org.clulab.reach.ReachCLI.outputMentions(ReachCLI.scala:226)
org.clulab.reach.ReachCLI$$anonfun$processPaper$1.apply(ReachCLI.scala:140)
org.clulab.reach.ReachCLI$$anonfun$processPaper$1.apply(ReachCLI.scala:140)
scala.collection.immutable.List.foreach(List.scala:381)
org.clulab.reach.ReachCLI.processPaper(ReachCLI.scala:140)
org.clulab.reach.ReachCLI$$anonfun$4.apply(ReachCLI.scala:87)
org.clulab.reach.ReachCLI$$anonfun$4.apply(ReachCLI.scala:81)
scala.collection.parallel.AugmentedIterableIterator$class.map2combiner(RemainsIterator.scala:115)
scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:62)
scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1054)
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1051)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:159)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
When running runReachCLI.sh, I'm getting many similar instances of this error (but only after it's been running for days):