bigdatagenomics / adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Apache License 2.0
1k stars 308 forks source link

StackOverflowError in avro SpecificDatumWriter #2349

Open heuermh opened 2 years ago

heuermh commented 2 years ago
$ spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0
      /_/

Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 11.0.12

...
  at org.bdgenomics.adam.util.ReferenceMap$.apply(ReferenceMap.scala:108)
  at org.bdgenomics.adam.ds.ADAMContext.loadReferenceFile(ADAMContext.scala:3510)
  ... 49 elided
Caused by: java.lang.StackOverflowError
  at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:98)
  at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:210)
  at org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:83)
  at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:131)
  at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83)
  at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
  at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:53)
  at org.bdgenomics.adam.serialization.AvroSerializer.write(ADAMKryoRegistrator.scala:42)
  at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
  at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
  at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
  at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
  at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
  at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
  at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
heuermh commented 2 years ago

Also happens on EMR with ADAM 0.36.0 release and Spark 3.1.2

$ adam-submit --version

       e        888~-_         e            e    e
      d8b       888   \       d8b          d8b  d8b
     /Y88b      888    |     /Y88b        d888bdY88b
    /  Y88b     888    |    /  Y88b      / Y88Y Y888b
   /____Y88b    888   /    /____Y88b    /   YY   Y888b
  /      Y88b   888_-~    /      Y88b  /          Y888b

ADAM version: 0.36.0
Built for: Apache Spark 3.1.2, Scala 2.12.10, and Hadoop 3.2.1

$ adam-shell \
  --master yarn \
  --driver-memory 32g \
  --executor-memory 16g \
  --conf spark.driver.cores=12 \
  --conf spark.executor.cores=4

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2-amzn-1
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)

...

scala> val ref = sc.loadReferenceFile("references/Homo_sapiens/NCBI/build37.2/Sequence/WholeGenomeFasta/genome.fa", 10000L)
21/12/01 21:06:00 WARN TaskSetManager: Lost task 13.0 in stage 5.0 (TID 131)
(ip-172-31-23-99.ec2.internal executor 6): java.lang.StackOverflowError
    at com.esotericsoftware.kryo.util.IdentityObjectIntMap.put(IdentityObjectIntMap.java:92)
    at com.esotericsoftware.kryo.util.MapReferenceResolver.addWrittenObject(MapReferenceResolver.java:41)
    at com.esotericsoftware.kryo.Kryo.writeReferenceOrNull(Kryo.java:681)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:570)
    at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
    at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
...