Closed olegs closed 5 years ago
Looks like a problem with viktor
's Viewer
property and concurrent access. For whatever reason, F64Array.V
is defined as lazy(LazyThreadSafetyMode.NONE) { Viewer(this) }
, and KDoc for this mode explicitly states:
No locks are used to synchronize an access to the [Lazy] instance value; if the instance is accessed from multiple threads, its behavior is undefined. This mode should not be used unless the [Lazy] instance is guaranteed never to be initialized from more than one thread.
It's hard to say what the author's motivation was, but we definitely do call V
from more than one thread, so we should change the mode to e.g. PUBLICATION
.
If I'm right, the problem is a race condition and should be hardly reproducible.
Indeed, looks like a race condition, cannot reproduce it on purpose.
Then it's actually an issue of viktor
and should be transferred there. I'll take care of it.
Even though, it looks like a race condition it's severity is quite high, I was able to stumble across the same issue twice in a row. It happened while processing ChIP-Seqs on a massive scale.
[Oct 28, 2019 20:18:57] 0.00% (0/100), Elapsed time: 17 μs
[Oct 28, 2019 20:19:03] Model fit: recalculating /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span/fit/GSM1297971_CCR4+_Tcells_H3K4me2_200.span: [FAILED] after 5.029 min
Caused by: ERROR
kotlin.KotlinNullPointerException
at kotlin.UnsafeLazyImpl.getValue(Lazy.kt:81)
at org.jetbrains.bio.viktor.F64Array.getV(F64Array.kt)
at org.jetbrains.bio.statistics.hmm.HMMInternals.logBackward(Internals.kt:83)
at org.jetbrains.bio.statistics.hmm.HMMIterationContext$expect$2.run(HMMIterationContext.kt:48)
at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:389)
at java.util.concurrent.ForkJoinTask.invokeAll(ForkJoinTask.java:761)
at org.jetbrains.bio.statistics.hmm.HMMIterationContext.expect(HMMIterationContext.kt:40)
at org.jetbrains.bio.statistics.IterationContext.iterate(ClassificationModel.kt:290)
at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$fit$1.invoke(MLAbstractHMM.kt:80)
at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$fit$1.invoke(MLAbstractHMM.kt:17)
at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$sam$java_util_function_Consumer$0.accept(MLAbstractHMM.kt)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1376)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Fixed by 1efb244c0f11dccf93ff3588709735485151a957.
Happened with SPAN version: https://download.jetbrains.com/biolabs/span/span-0.11.0.4882.jar cd /mnt/stripe/bio/raw-data/geo-samples/GSE53643 SPAN output: