Open lbergelson opened 5 years ago
We've seen this at least twice now. Seems like something we should investigate.
Saw it again here (now restarted). If I'm reading the serialization stack in the right order:
Serialization trace:
classes (sun.misc.Launcher$AppClassLoader)
classLoader (org.apache.hadoop.conf.Configuration)
conf (org.apache.hadoop.hdfs.DistributedFileSystem)
fs (hdfs.jsr203.HadoopFileSystem)
hdfs (hdfs.jsr203.HadoopPath)
path (htsjdk.samtools.seekablestream.SeekablePathStream)
seekableStream (htsjdk.tribble.TribbleIndexedFeatureReader)
featureReader (org.broadinstitute.hellbender.engine.FeatureDataSource)
featureSources (org.broadinstitute.hellbender.engine.FeatureManager)
it looks like we're trying to serialize a ClassLoader. The FieldSerializer does appear to use a ClassLoader to load classes during serialization.
Seems like I'm getting this in almost 50% of my builds. Master is failing because of it---will restart.
I haven't been able to reproduce this locally by running ReadsPipelineSparkIntegrationTest
repeatedly. In fact, it looks like another test is interacting with this one, since the stack trace references HDFS paths, but this test doesn't use HDFS at all.
Another oddity: TribbleIndexedFeatureReader
implies it's reading a vcf.idx file, but HaplotypeCallerSpark
, where the exception occurs, is not reading any VCF files (although BQSR does earlier in the pipeline for known sites).
Also, we shouldn't be serializing FeatureDataSource
objects with remote resources any more, since we use Spark --files
to copy them to the worker nodes (see https://github.com/broadinstitute/gatk/blob/master/src/main/java/org/broadinstitute/hellbender/engine/spark/GATKSparkTool.java#L699). So we shouldn't be seeing FeatureDataSource
trying to serializing with an HDFS path.
Has anyone seen this running locally on their machine, or only on Travis?
I think this is happening because were trying to serialize the class loader sun.misc.Launcher$AppClassLoader), which appears to be reached through the graph by way of via https://github.com/damiencarol/jsr203-hadoop/blob/master/src/main/java/hdfs/jsr203/HadoopFileSystem.java#L82. We probably need to short circuit that with a custom serializer for one of these:
Serialization trace: classes (sun.misc.Launcher$AppClassLoader) classLoader (org.apache.hadoop.conf.Configuration) conf (org.apache.hadoop.hdfs.DistributedFileSystem) fs (hdfs.jsr203.HadoopFileSystem) hdfs (hdfs.jsr203.HadoopPath) path (htsjdk.samtools.seekablestream.SeekablePathStream) seekableStream (htsjdk.tribble.TribbleIndexedFeatureReader) featureReader (org.broadinstitute.hellbender.engine.FeatureDataSource) featureSources (org.broadinstitute.hellbender.engine.FeatureManager)
See, for instance, https://github.com/dbpedia/distributed-extraction-framework/issues/9.
We've seen at least 1 non-deterministically occurring instance of ConcurrentModificationException while running the
ReadsPipelineSparkIntegrationTest.testReadsPipelineSpark[5]
It seems like there is a race condition somewhere.
@jamesemery @tomwhite I've seen this once, so it may be a super rare one that we're just hitting now, or something newly introduced. Not sure there's anything to do until we see it more often, but thought I'd record it in case it keeps coming back.