fix NegativeArraySizeException in ReadsSparkPipeline

lbergelson commented 8 years ago

We have multiple reports of NegativeArraySizeException being thrown during kryo serialization while running ReadsSparkPipeline

an example

./gatk/gatk-launch \
ReadsPipelineSpark \
-I $bam \
-R $ref \
--programName ${name} \
-O $bamout \
--bamPartitionSize 134217728 \
--knownSites $dbsnp \
--shardedOutput true \
--emit_original_quals \
--duplicates_scoring_strategy SUM_OF_BASE_QUALITIES \
-- \
--sparkRunner LOCAL

resulted in

com.esotericsoftware.kryo.KryoException: java.lang.NegativeArraySizeException
Serialization trace:
vs (org.broadinstitute.hellbender.utils.collections.IntervalsSkipListOneContig)
intervals (org.broadinstitute.hellbender.utils.collections.IntervalsSkipList)
at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585)
at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:82)

lbergelson commented 8 years ago

According to collaborators this is caused by a bug running kryo on JDK8 and can be fixed either by:

patching kryo
a jvm argument of some sort (what argument that is? Unknown...)

I'll update this when I get more information.

lbergelson commented 8 years ago

Kryo issue tracking this here is here https://github.com/EsotericSoftware/kryo/issues/382

Temporary fix is to run with the following set:

spark.executor.extraJavaOptions –XX:hashCode=0
spark.driver.extraJavaOptions –XX:hashCode=0

droazen commented 8 years ago

alpha-2

akiezun commented 8 years ago

@lbergelson is this still an issue? We're on Kryo 3.0.3 now

lbergelson commented 8 years ago

It's still an issue unfortunately. The changes we need are merged, but 3.0.3 was released nearly a year ago. I asked for an official new release https://github.com/EsotericSoftware/kryo/issues/431, we'll see if they respond. We could build our own version and publish it, but that seems like an unfortunate thing to have to do and I assume cloudera wouldn't incorporate it in their distribution.

lbergelson commented 8 years ago

Kryo 4.0.0 is released including the changes we need. Now we need to figure out how to get that into our clusters.

droazen commented 7 years ago

@lbergelson Are we using a version of Kryo with the fix?

lbergelson commented 7 years ago

We're not using the kryo version with the fix. We can open a ticket against spark to see if someone will update it, but I suspect it will be a long slog to get it in.

geoHeil commented 7 years ago

I happened to see the same problem and created an issue with spark here https://issues.apache.org/jira/browse/SPARK-20389

sooheelee commented 7 years ago

What is the status of resolving this bug? Is it fixed in GATK4.beta.2?

droazen commented 7 years ago

Not fixed yet, targeted for 4.0 general release.

sooheelee commented 7 years ago

Ok. Thanks for that.

danking commented 5 years ago

FYI, Spark 2.4.0 upgraded to Kryo 4.0.0 in 3e033035. There does not appear to be a back port to older spark versions.

Did y'all happen to also try this:

Kryo kryo = new Kryo();
kryo.setReferences(false);

I'd like to avoid asking Hail users to set JVM options, so this approach is appealing to me. Curious if you all had experience trying it out.

lbergelson commented 5 years ago

Oh, that's great that it's fixed in 2.4.0 I think we'd sort of given up on this since we had a work around. I thinksetReferences(false) should work, but it could potentially badly inflate the sizes of your serialized objects if you're using kyro's field serializer to serialize repetitive data I think.

We should see about upgrading to 2.4.0... It looks like there's a dataproc preview with it that we could use. @tomwhite Any thoughts about upgrading or not to 2.4.0?

tomwhite commented 5 years ago

We'd need to test GATK on a 2.4 cluster, but I don't see why we wouldn't upgrade. I would like to wait until 2.4.1 (should be out soon) as it enables testing for Java 11 too. See #5782

tomwhite commented 5 years ago

This should be fixed now that we're using Spark 2.4. Is anyone able to check?

broadinstitute / gatk

fix NegativeArraySizeException in ReadsSparkPipeline #1524