JetBrains-Research / viktor

Efficient f64-only ndarray in Kotlin
MIT License
125 stars 6 forks source link

NPE #22

Closed olegs closed 5 years ago

olegs commented 5 years ago

Happened with SPAN version: https://download.jetbrains.com/biolabs/span/span-0.11.0.4882.jar cd /mnt/stripe/bio/raw-data/geo-samples/GSE53643 SPAN output:

[Oct 26, 2019 02:15:45] COMMAND:
analyze -t bams/GSM1297966_CCR4+_Tcells_H3K4me2.bam --chrom.sizes hg38.chrom.sizes --peaks span/GSM1297966_CCR4+_Tcells_H3K4me2_200_1E-6_5.peak --model span/fit/GSM1297966_CCR4+_Tcells_H3K4me2_200.span
--workdir span --threads 4 --bin 200 --fdr 1E-6 --gap 5
[Oct 26, 2019 02:15:45] LOG: /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span/logs/GSM1297966_CCR4+_Tcells_H3K4me2_200_1E-6_5.log
[Oct 26, 2019 02:15:45] MODEL: /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span/fit/GSM1297966_CCR4+_Tcells_H3K4me2_200.span
[Oct 26, 2019 02:15:45] WORKING DIR: /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span
[Oct 26, 2019 02:15:45] TREATMENT: /mnt/stripe/bio/raw-data/geo-samples/GSE53643/bams/GSM1297966_CCR4+_Tcells_H3K4me2.bam
[Oct 26, 2019 02:15:45] CONTROL: none
[Oct 26, 2019 02:15:45] BIN: 200
[Oct 26, 2019 02:15:45] FRAGMENT: auto
[Oct 26, 2019 02:15:45] KEEP DUPLICATES: false
[Oct 26, 2019 02:15:45] FDR: 1.0E-6
[Oct 26, 2019 02:15:45] GAP: 5
[Oct 26, 2019 02:15:45] PEAKS: /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span/GSM1297966_CCR4+_Tcells_H3K4me2_200_1E-6_5.peak
[Oct 26, 2019 02:15:45] THREADS: 4
[Oct 26, 2019 02:15:45] Coverage for GSM1297966_CCR4+_Tcells_H3K4me2.bam: recalculating /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span/cache/coverage_GSM1297966_CCR4+_Tcells_H3K4me2_unique#1fd36.npz
...
[Oct 26, 2019 02:15:45] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 1, Elapsed time: 21 ms
[Oct 26, 2019 02:15:55] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 2,973,774, Elapsed time: 10 s, Throughput: 3 μs/item
[Oct 26, 2019 02:16:06] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 3,147,767, Elapsed time: 20 s, Throughput: 6 μs/item
[Oct 26, 2019 02:16:16] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 3,502,374, Elapsed time: 30 s, Throughput: 8 μs/item
[Oct 26, 2019 02:16:26] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 7,356,324, Elapsed time: 40 s, Throughput: 5 μs/item
[Oct 26, 2019 02:16:36] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 7,556,450, Elapsed time: 50 s, Throughput: 6 μs/item
[Oct 26, 2019 02:16:46] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 7,893,051, Elapsed time: 1 min, Throughput: 7 μs/item
[Oct 26, 2019 02:16:56] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 8,062,617, Elapsed time: 1 min 10 s, Throughput: 8 μs/item
[Oct 26, 2019 02:17:06] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 8,302,656, Elapsed time: 1 min 20 s, Throughput: 9 μs/item
[Oct 26, 2019 02:17:16] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 8,470,841, Elapsed time: 1 min 30 s, Throughput: 10 μs/item
[Oct 26, 2019 02:17:26] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 8,687,385, Elapsed time: 1 min 41 s, Throughput: 11 μs/item
[Oct 26, 2019 02:17:37] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 8,886,709, Elapsed time: 1 min 51 s, Throughput: 12 μs/item
[Oct 26, 2019 02:17:47] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 10,977,979, Elapsed time: 2 min 1 s, Throughput: 11 μs/item
[Oct 26, 2019 02:17:53] Loading reads GSM1297966_CCR4+_Tcells_H3K4me2.bam: 14,973,065, Elapsed time: 2 min 8 s, Throughput: 8 μs/item [done]
[Oct 26, 2019 02:18:05] Coverage for GSM1297966_CCR4+_Tcells_H3K4me2.bam: recalculating /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span/cache/coverage_GSM1297966_CCR4+_Tcells_H3K4me2_unique#1fd36.npz
: done in 2.337 min
[Oct 26, 2019 02:18:05] chr19_GL949749v2_alt: no reads detected, ignoring.
[Oct 26, 2019 02:18:05] chr19_GL949750v2_alt: no reads detected, ignoring.
[Oct 26, 2019 02:18:05] chr19_GL949748v2_alt: no reads detected, ignoring.
[Oct 26, 2019 02:18:05] chr19_GL949751v2_alt: no reads detected, ignoring.
[Oct 26, 2019 02:18:05] chr19_GL949753v2_alt: no reads detected, ignoring.
[Oct 26, 2019 02:18:05] chr15_KI270852v1_alt: no reads detected, ignoring.
[Oct 26, 2019 02:18:05] chr15_KI270727v1_random: no reads detected, ignoring.
...
[Oct 26, 2019 02:23:51] Model fit: recalculating /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span/fit/GSM1297966_CCR4+_Tcells_H3K4me2_200.span: [FAILED] after 5.762 min
Caused by: ERROR
kotlin.KotlinNullPointerException
        at kotlin.UnsafeLazyImpl.getValue(Lazy.kt:81)
        at org.jetbrains.bio.viktor.F64Array.getV(F64Array.kt)
        at org.jetbrains.bio.statistics.hmm.HMMInternals.logBackward(Internals.kt:83)
        at org.jetbrains.bio.statistics.hmm.HMMIterationContext$expect$2.run(HMMIterationContext.kt:48)
        at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:389)
        at java.util.concurrent.ForkJoinTask.invokeAll(ForkJoinTask.java:761)
        at org.jetbrains.bio.statistics.hmm.HMMIterationContext.expect(HMMIterationContext.kt:40)
        at org.jetbrains.bio.statistics.IterationContext.iterate(ClassificationModel.kt:290)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$fit$1.invoke(MLAbstractHMM.kt:80)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$fit$1.invoke(MLAbstractHMM.kt:17)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$sam$java_util_function_Consumer$0.accept(MLAbstractHMM.kt)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1376)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
        at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

kotlin.KotlinNullPointerException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:598)
        at java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
        at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
        at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:583)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM.fit(MLAbstractHMM.kt:80)
        at org.jetbrains.bio.statistics.Fitter$DefaultImpls.fit(ClassificationModel.kt:186)
        at org.jetbrains.bio.statistics.Fitter$multiStarted$1.fit(ClassificationModel.kt:231)
        at org.jetbrains.bio.experiments.fit.SpanPeakCallingExperiment$Companion$semanticCheck$1.fit(SpanPeakCallingExperiment.kt:125)
        at org.jetbrains.bio.experiments.fit.SpanPeakCallingExperiment$Companion$semanticCheck$1.fit(SpanPeakCallingExperiment.kt:109)
        at org.jetbrains.bio.statistics.Fitter$DefaultImpls.fit$default(ClassificationModel.kt:180)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment.calculateModel(SpanModelFitExperiment.kt:303)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment.access$calculateModel(SpanModelFitExperiment.kt:262)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment$getOrLoadResults$1.invoke(SpanModelFitExperiment.kt:342)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment$getOrLoadResults$1.invoke(SpanModelFitExperiment.kt:262)
        at org.jetbrains.bio.util.PathExtensionsKt$checkOrRecalculate$1$$special$$inlined$time$lambda$1.invoke(PathExtensions.kt:326)
        at org.jetbrains.bio.util.PathExtensionsKt$checkOrRecalculate$1$$special$$inlined$time$lambda$1.invoke(PathExtensions.kt)
        at org.jetbrains.bio.util.PathExtensionsKt$checkOrRecalculate$1.invoke(PathExtensions.kt:467)
        at org.jetbrains.bio.util.PathExtensionsKt$checkOrRecalculate$1.invoke(PathExtensions.kt)
        at org.jetbrains.bio.util.LockManager.synchronized(LockManager.kt:39)
        at org.jetbrains.bio.util.PathExtensionsKt.checkOrRecalculate(PathExtensions.kt:307)
        at org.jetbrains.bio.util.PathExtensionsKt.checkOrRecalculate$default(PathExtensions.kt:304)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment.getOrLoadResults(SpanModelFitExperiment.kt:339)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment.access$getOrLoadResults(SpanModelFitExperiment.kt:262)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment$results$2.invoke(SpanModelFitExperiment.kt:281)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment$results$2.invoke(SpanModelFitExperiment.kt:262)
        at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
        at org.jetbrains.bio.experiments.fit.SpanModelFitExperiment.getResults(SpanModelFitExperiment.kt)
        at org.jetbrains.bio.span.SpanCLA$peakCallingResults$1.invoke(SpanCLA.kt:650)
        at org.jetbrains.bio.span.SpanCLA$peakCallingResults$1.invoke(SpanCLA.kt:36)
        at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
        at org.jetbrains.bio.span.SpanCLA$analyze$$inlined$with$lambda$1.invoke(SpanCLA.kt:146)
        at org.jetbrains.bio.span.SpanCLA$analyze$$inlined$with$lambda$1.invoke(SpanCLA.kt:36)
        at org.jetbrains.bio.util.OptionParserExtensionsKt.parse(OptionParserExtensions.kt:70)
        at org.jetbrains.bio.util.OptionParserExtensionsKt.parse$default(OptionParserExtensions.kt:26)
        at org.jetbrains.bio.span.SpanCLA.analyze(SpanCLA.kt:116)
        at org.jetbrains.bio.span.SpanCLA.main(SpanCLA.kt:79)
Caused by: kotlin.KotlinNullPointerException
        at kotlin.UnsafeLazyImpl.getValue(Lazy.kt:81)
        at org.jetbrains.bio.viktor.F64Array.getV(F64Array.kt)
        at org.jetbrains.bio.statistics.hmm.HMMInternals.logBackward(Internals.kt:83)
        at org.jetbrains.bio.statistics.hmm.HMMIterationContext$expect$2.run(HMMIterationContext.kt:48)
        at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:389)
        at java.util.concurrent.ForkJoinTask.invokeAll(ForkJoinTask.java:761)
        at org.jetbrains.bio.statistics.hmm.HMMIterationContext.expect(HMMIterationContext.kt:40)
        at org.jetbrains.bio.statistics.IterationContext.iterate(ClassificationModel.kt:290)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$fit$1.invoke(MLAbstractHMM.kt:80)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$fit$1.invoke(MLAbstractHMM.kt:17)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$sam$java_util_function_Consumer$0.accept(MLAbstractHMM.kt)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1376)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
        at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Exception in thread "main" kotlin.KotlinNullPointerException
        at org.jetbrains.bio.util.OptionParserExtensionsKt.parse(OptionParserExtensions.kt:101)
        at org.jetbrains.bio.util.OptionParserExtensionsKt.parse$default(OptionParserExtensions.kt:26)
        at org.jetbrains.bio.span.SpanCLA.analyze(SpanCLA.kt:116)
        at org.jetbrains.bio.span.SpanCLA.main(SpanCLA.kt:79)
dievsky commented 5 years ago

Looks like a problem with viktor's Viewer property and concurrent access. For whatever reason, F64Array.V is defined as lazy(LazyThreadSafetyMode.NONE) { Viewer(this) }, and KDoc for this mode explicitly states:

No locks are used to synchronize an access to the [Lazy] instance value; if the instance is accessed from multiple threads, its behavior is undefined. This mode should not be used unless the [Lazy] instance is guaranteed never to be initialized from more than one thread.

It's hard to say what the author's motivation was, but we definitely do call V from more than one thread, so we should change the mode to e.g. PUBLICATION.

If I'm right, the problem is a race condition and should be hardly reproducible.

olegs commented 5 years ago

Indeed, looks like a race condition, cannot reproduce it on purpose.

dievsky commented 5 years ago

Then it's actually an issue of viktor and should be transferred there. I'll take care of it.

olegs commented 5 years ago

Even though, it looks like a race condition it's severity is quite high, I was able to stumble across the same issue twice in a row. It happened while processing ChIP-Seqs on a massive scale.

[Oct 28, 2019 20:18:57] 0.00% (0/100), Elapsed time: 17 μs
[Oct 28, 2019 20:19:03] Model fit: recalculating /mnt/stripe/bio/raw-data/geo-samples/GSE53643/span/fit/GSM1297971_CCR4+_Tcells_H3K4me2_200.span: [FAILED] after 5.029 min
Caused by: ERROR
kotlin.KotlinNullPointerException
        at kotlin.UnsafeLazyImpl.getValue(Lazy.kt:81)
        at org.jetbrains.bio.viktor.F64Array.getV(F64Array.kt)
        at org.jetbrains.bio.statistics.hmm.HMMInternals.logBackward(Internals.kt:83)
        at org.jetbrains.bio.statistics.hmm.HMMIterationContext$expect$2.run(HMMIterationContext.kt:48)
        at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:389)
        at java.util.concurrent.ForkJoinTask.invokeAll(ForkJoinTask.java:761)
        at org.jetbrains.bio.statistics.hmm.HMMIterationContext.expect(HMMIterationContext.kt:40)
        at org.jetbrains.bio.statistics.IterationContext.iterate(ClassificationModel.kt:290)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$fit$1.invoke(MLAbstractHMM.kt:80)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$fit$1.invoke(MLAbstractHMM.kt:17)
        at org.jetbrains.bio.statistics.hmm.MLAbstractHMM$sam$java_util_function_Consumer$0.accept(MLAbstractHMM.kt)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1376)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
        at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
dievsky commented 5 years ago

Fixed by 1efb244c0f11dccf93ff3588709735485151a957.