VCFHeaderReader uses a try catch to fall back to BCF encoding, which leads to incorrect error messages and stack trace if the header is actually VCF format but has unrelated errors.
E.g. Here the first exception should have been thrown (Your input file has a malformed header: Count < 0 for fixed size VCF header field BAD_PS), not logged as a warning, and the second exception should not have happened (Input stream does not contain a BCF encoded file; BCF magic header info not found, at record 0 with position 0).
scala> val variants = sc.loadVariants("truth_small_variants.variants.adam")
warning: while trying to read VCF header from file received exception: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: Count < 0 for fixed size VCF header field BAD_PS
htsjdk.tribble.TribbleException: Input stream does not contain a BCF encoded file; BCF magic header info not found, at record 0 with position 0:
at htsjdk.variant.bcf2.BCF2Codec.error(BCF2Codec.java:478)
at htsjdk.variant.bcf2.BCF2Codec.readHeader(BCF2Codec.java:149)
at org.seqdoop.hadoop_bam.util.VCFHeaderReader.readHeaderFrom(VCFHeaderReader.java:67)
at org.bdgenomics.adam.rdd.ADAMContext.org$bdgenomics$adam$rdd$ADAMContext$$readVcfHeader(ADAMContext.scala:228)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadHeaderLines$1.apply(ADAMContext.scala:234)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadHeaderLines$1.apply(ADAMContext.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.bdgenomics.adam.rdd.ADAMContext.loadHeaderLines(ADAMContext.scala:234)
at org.bdgenomics.adam.rdd.ADAMContext.loadParquetVariants(ADAMContext.scala:1175)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVariants$1.apply(ADAMContext.scala:1733)
at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadVariants$1.apply(ADAMContext.scala:1728)
at scala.Option.fold(Option.scala:158)
at org.apache.spark.rdd.Timer.time(Timer.scala:48)
at org.bdgenomics.adam.rdd.ADAMContext.loadVariants(ADAMContext.scala:1726)
... 50 elided
VCFHeaderReader
uses a try catch to fall back to BCF encoding, which leads to incorrect error messages and stack trace if the header is actually VCF format but has unrelated errors.E.g. Here the first exception should have been thrown (
Your input file has a malformed header: Count < 0 for fixed size VCF header field BAD_PS
), not logged as a warning, and the second exception should not have happened (Input stream does not contain a BCF encoded file; BCF magic header info not found, at record 0 with position 0
).