broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 588 forks source link

Regenerate invalid CRAM test files #6018

Open cmnbroad opened 5 years ago

cmnbroad commented 5 years ago

There are quite a few v2.1 CRAM test files being used in GATK that should probably be regenerated and replaced with v3.0 files.

There are also quite a few CRAM test files floating around in both htsjdk and GATK that have external blocks with content ID=0 (not valid per the spec) and some of those blocks have no actual content:

gatk/src/test/resources/large/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/ClipReads/expected.clippingReadsTestCRAM.QT_10.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/ClipReads/clippingReadsTestCRAM.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/print_reads.sorted.queryname.htsjdk-2.1.0.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/BQSR/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.10m-10m100.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/BQSR/expected.MultiSite.bqsr.pipeline.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/BQSR/CEUTrio.HiSeq.WGS.b37.ch20.1m-1m1k.NA12878.noMD.noBQSR.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/BQSR/expected.MultiSite.reads.pipeline.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/validation/single.read.cram gatk/src/test/resources/org/broadinstitute/hellbender/tools/validation/another.single.read.cram

These have external blocks with ID=0, but the blocks have no actual content:

gatk/src/test/resources/org/broadinstitute/hellbender/engine/cram_with_crai_index.cram (0 bytes) gatk/src/test/resources/org/broadinstitute/hellbender/engine/cram_with_bai_index.cram (0 bytes)

We should regenerate and replace with v3.0 CRAM files.

cmnbroad commented 5 years ago

This list was generated using a not-yet merged version of my CRAM metadata tool that uses my not-yet-merged refactored CRAM code.

cmnbroad commented 5 years ago

Corresponding htsjdk issue is https://github.com/samtools/htsjdk/issues/1232.