bigdatagenomics / adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Apache License 2.0
1k stars 308 forks source link

Parquet/Avro schema mismatch in transformFragments #2387

Closed heuermh closed 10 months ago

heuermh commented 1 year ago
$ adam-submit transformFragments small.alignments.adam small.fragments.adam
...
23/05/18 09:43:57 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch. Avro field 'referenceName' not found.
    at org.apache.parquet.avro.AvroIndexedRecordConverter.getAvroField(AvroIndexedRecordConverter.java:135)
    at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:91)
    at org.apache.parquet.avro.AvroIndexedRecordConverter.<init>(AvroIndexedRecordConverter.java:66)
    at org.apache.parquet.avro.AvroCompatRecordMaterializer.<init>(AvroCompatRecordMaterializer.java:34)
    at org.apache.parquet.avro.AvroReadSupport.newCompatMaterializer(AvroReadSupport.java:148)
    at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:140)
    at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:213)
    at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
    at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:221)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:218)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:173)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:72)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
    at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
    at org.apache.spark.scheduler.Task.run(Task.scala:139)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1623)
heuermh commented 10 months ago

Closing as not a valid issue.

To convert alignments to fragments, use -load_as_alignments option, e.g.

$ adam-submit transformFragments \
    -load_as_alignments \
    small.alignments.adam \
    small.fragments.adam