broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

CRAM reading gives MD5 mismatch with correct reference that samtools can decram fine #3154

Closed sooheelee closed 6 years ago

sooheelee commented 7 years ago

This CRAM file and its reference are identical to those I have in the cloud because I uploaded these exact files to the cloud after downloading them from the 1000 Genomes Project FTP site. Back then, in the cloud, I was able to use samtools to decram and index this CRAM file alongside 39 others.

On our local server, I cannot get readwalkers PrintReads nor CalculateTargetCoverage to correctly decipher the CRAM. Both tools give the same error.

Here is the PrintReads command:

/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.alpha.2-1134-ga9d9d91-SNAPSHOT/gatk-launch \
  PrintReads \
  -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa \
  -I /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/cram/HG02759.alt_bwamem_GRCh38DH.20150826.GWD.exome.cram \
  -O HG02759.alt_bwamem_GRCh38DH.20150826.GWD.exome.bam

And here is the error:

17:47:15.362 INFO  ProgressMeter -       chr1:198467627              2.6               8432000        3202552.3
17:47:25.402 INFO  ProgressMeter -       chr1:236860077              2.8              10019000        3577916.1
ERROR   2017-06-22 17:47:27     Slice   Reference MD5 mismatch for slice 0:248681942-248858764, ATAGCGGTCA...AGTGGCGGTG
17:47:27.292 INFO  CalculateTargetCoverage - Shutting down engine
[June 22, 2017 5:47:27 PM EDT] org.broadinstitute.hellbender.tools.exome.CalculateTargetCoverage done. Elapsed time: 2.87 minutes.
Runtime.totalMemory()=10377756672
htsjdk.samtools.cram.CRAMException: Reference sequence MD5 mismatch for slice: sequence id 0, start 248681942, span 176823, expected MD5 4b8526e90896b01860301e5a1ef4988b
        at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:187)
        at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:261)
        at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:601)
        at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.hasNext(SAMRecordToReadIterator.java:24)
        at java.util.Iterator.forEachRemaining(Iterator.java:115)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:779)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:122)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:143)
        at org.broadinstitute.hellbender.Main.main(Main.java:221)

This error occurs fairly soon after launching the job, after the progress meter shows the tool iterating over chromosome 1.

sooheelee commented 7 years ago

@cmnbroad Can you please have a look?

cmnbroad commented 7 years ago

@sooheelee Since samtools can decode it, it sounds exactly like https://github.com/samtools/htsjdk/issues/800, for which we have a two part fix (this and this) that is very close to being ready.

sooheelee commented 7 years ago

Ok. Very excited to have the feature work.

droazen commented 7 years ago

@sooheelee Are there ambiguity codes in the reads/reference near the locus where it fails?

sooheelee commented 7 years ago

I haven't checked @droazen. Is this important for your work?

Sorry for the delay in answering. I've been busy with the workshop somatic CNV tutorial writing.

cmnbroad commented 7 years ago

Fix is merged in htsjdk now https://github.com/samtools/htsjdk/pull/889. I'll keep this open until we update GATK with a new htsjdk.

cmnbroad commented 6 years ago

@Sooheelee This should be fixed and I'd like to close it - can you verify that it works for your case when you get a chance ?

sooheelee commented 6 years ago

Just back from workshop/vacay. I will take a look.

sooheelee commented 6 years ago

Hmm, looks like I deleted the CRAM /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/cram/HG02759.alt_bwamem_GRCh38DH.20150826.GWD.exome.cram. (I was a high storage cost offender and asked to clean up.) Luckily, I have another CRAM available that I left for testing purposes. I can recapitulate the error with this other CRAM using the old gatk-4.alpha.2-1134-ga9d9d91-SNAPSHOT jar.

-bash-4.1$ /humgen/gsa-hpprojects/GATK/gatk4/gatk-4.alpha.2-1134-ga9d9d91-SNAPSHOT/gatk-launch   PrintReads   -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa   -I /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram -O /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/test_decram_20171002.bam

Gives the same error at an approximately similar region:

...
10:50:54.603 INFO  ProgressMeter -       chr1:224042054              2.2              11004000        5016945.0
10:51:04.609 INFO  ProgressMeter -       chr1:248061327              2.4              11905000        5044206.5
ERROR   2017-10-02 10:51:05 Slice   Reference MD5 mismatch for slice 0:248574592-248771907, AGTGGATGAG...TGTCGGTATG
10:51:06.940 INFO  PrintReads - Shutting down engine
[October 2, 2017 10:51:06 AM EDT] org.broadinstitute.hellbender.tools.PrintReads done. Elapsed time: 2.45 minutes.
Runtime.totalMemory()=5995233280
htsjdk.samtools.cram.CRAMException: Reference sequence MD5 mismatch for slice: sequence id 0, start 248574592, span 197316, expected MD5 cc8ace0545facc11349da783af07a076
    at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:187)
    at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:261)
    at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:601)
    at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.hasNext(SAMRecordToReadIterator.java:24)
    at java.util.Iterator.forEachRemaining(Iterator.java:115)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:779)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:122)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:143)
    at org.broadinstitute.hellbender.Main.main(Main.java:221)

Now using the latest jar gatk-4.beta.5 in the same command:

-bash-4.1$ /humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-launch   PrintReads   -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa   -I /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram -O /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/test_decram_20171002.bam

I see that the run goes past the problem region.

...
10:56:39.570 INFO  ProgressMeter -       chr1:179846959              2.2               9271000        4228249.1
10:56:49.572 INFO  ProgressMeter -       chr1:198702508              2.4               9889000        4191438.3
10:56:59.631 INFO  ProgressMeter -       chr1:219611298              2.5              10821000        4282152.8
10:57:09.660 INFO  ProgressMeter -       chr1:242292327              2.7              11691000        4339428.9
10:57:19.665 INFO  ProgressMeter -         chr2:6849819              2.9              12102000        4230137.4
10:57:29.667 INFO  ProgressMeter -        chr2:26448455              3.0              12672000        4185516.5
10:57:39.696 INFO  ProgressMeter -        chr2:48688700              3.2              13582000        4251349.9
10:57:49.698 INFO  ProgressMeter -        chr2:74009861              3.4              14355000        4270478.5
...

I'll let this finish (and post again if there is an error), but I suspect the issue is solved. Thank you @cmnbroad for the fix.

sooheelee commented 6 years ago

All good:

11:15:37.269 INFO  ProgressMeter -       chrX:102154117             21.2             105064000        4966574.8
11:15:47.308 INFO  ProgressMeter -       chrX:148052583             21.3             105984000        4970730.0
11:15:57.311 INFO  ProgressMeter - chr1_KI270711v1_random:8733             21.5             106809000        4970557.7
11:16:07.348 INFO  ProgressMeter - chr1_KI270763v1_alt:521207             21.7             107601000        4968734.1
11:16:17.392 INFO  ProgressMeter - chr7_KI270803v1_alt:794832             21.8             108422000        4968240.7
11:16:27.410 INFO  ProgressMeter - chr16_KI270853v1_alt:1992946             22.0             109486000        4978907.0
11:16:37.438 INFO  ProgressMeter - chr19_KI270885v1_alt:18015             22.2             110571000        4990315.4
11:16:47.479 INFO  ProgressMeter - chr6_GL000251v2_alt:3288752             22.3             111608000        4999358.0
11:16:57.479 INFO  ProgressMeter - chr19_KI270915v1_alt:116858             22.5             112656000        5008907.2
11:17:07.482 INFO  ProgressMeter - chr6_GL000252v2_alt:3648002             22.7             113732000        5019540.7
11:17:17.485 INFO  ProgressMeter - chr19_GL000209v2_alt:107854             22.8             114378000        5011183.1
11:17:27.485 INFO  ProgressMeter - chr6_GL000254v2_alt:4706141             23.0             115453000        5021609.7
11:17:37.489 INFO  ProgressMeter - chr6_GL000256v2_alt:3229060             23.2             116516000        5031360.7
11:17:47.495 INFO  ProgressMeter - chrUn_KN707963v1_decoy:59481             23.3             117352000        5031225.8
11:17:57.500 INFO  ProgressMeter -    HLA-A*02:53N:2725             23.5             117662000        5008712.4
11:18:02.114 INFO  PrintReads - No reads filtered by: WellformedReadFilter
11:18:02.114 INFO  ProgressMeter -             unmapped             23.6             118064258        5009433.9
11:18:02.114 INFO  ProgressMeter - Traversal complete. Processed 118064258 total reads in 23.6 minutes.
11:18:05.615 INFO  PrintReads - Shutting down engine
[October 2, 2017 11:18:05 AM EDT] org.broadinstitute.hellbender.tools.PrintReads done. Elapsed time: 23.67 minutes.
Runtime.totalMemory()=11493965824

And I can view the BAM with Samtools:

samtools view /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/test_decram_20171002.bam | less
SRR070790.20928984      99      chr1    10004   0       3S97M   =       10036   116     AACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC    <<<<<BFBBBBBFBBBBBBBBBBBFBBBBBFBBBBBBBBBBBBBBBBBBBBBBBBBBBBBFBBBBBBBBB7B<BBBB<F<B<<BBBB<<BFB<770<<<7    MC:Z:61M1D22M17S        RG:Z:SRR070790  MQ:i:0  AS:i:97 XS:i:100
SRR070503.24638125      145     chr1    10008   0       100M    chr6    8088710 0       AGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC    ''''''''''''BBB<<<<7<7<<BBBBFBBBFFBBBBBBFBFBBBFBBBBBFBFBBBFBFBBBF<BBBBFBBBBBFBBBBBFBBBBBFBBBBBB<<<<<    RG:Z:SRR070503  MQ:i:60 AS:i:98 XS:i:98
SRR070790.26540140      65      chr1    10016   0       20M3D9M3D13M1I57M       chr4    190122739       0       CCCTAACCCTAACCACAACCACCCTAACCACCCT
...
cmnbroad commented 6 years ago

@sooheelee Thanks for digging that up and doing that verification.

ldgauthier commented 6 years ago

So at this point are we free to run GATK tools on CRAMs without reverting to BAMs first?

sooheelee commented 6 years ago

No problem Chris. I'm going to assume we can run on CRAMs any tool that takes a BAM. I will be testing this and if I find otherwise, I'll let the team know.

cmnbroad commented 6 years ago

@ldgauthier @sooheelee AFAIK, this was the main blocking issue. One caveat though - last week we replaced all of the copies of the Picard tools in GATK with the ACTUAL tools from the Picard jar. The GATK copies were mostly CRAM enabled and tested, but I think the Picard ones have various mixed states wrt read/write capabilities for CRAM.

ldgauthier commented 6 years ago

Supposing I wanted to re-run HaplotypeCaller from the CRAM, is that a tool that's been tested with CRAM?

sooheelee commented 6 years ago

Related issue https://github.com/broadinstitute/gatk/issues/3669 from testing CRAM usage.

cmnbroad commented 6 years ago

I doubt much CRAM testing has been done with HaplotypeCaller, and I don't see any integration tests for it using CRAM. FWIW, since we have a CRAM version of the BAM we use for those tests, I did just try running them on the CRAM and they passed. As SooHee mentioned though, you'll have issues if the reference you use is in a GCS bucket, or if you specify intervals and the CRAM (and it's index) are in a bucket.

sooheelee commented 6 years ago

Yes, I have also just tested HaplotypeCaller on a CRAM and it seems to run fine locally.

/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-launch HaplotypeCaller \
    -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa \
    -I HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram \
    -O HG00190_cram_HC.vcf \
    -L chr17

gives

Using GATK jar /humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -jar /humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar HaplotypeCaller -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa -I HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram -O HG00190_cram_HC.vcf -L chr17
14:50:16.528 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/intel/gkl/native/libgkl_compression.so
[October 6, 2017 2:50:16 PM EDT] HaplotypeCaller  --output HG00190_cram_HC.vcf --intervals chr17 --input HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram --reference /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa  --group StandardAnnotation --group StandardHCAnnotation --GVCFGQBands 1 --GVCFGQBands 2 --GVCFGQBands 3 --GVCFGQBands 4 --GVCFGQBands 5 --GVCFGQBands 6 --GVCFGQBands 7 --GVCFGQBands 8 --GVCFGQBands 9 --GVCFGQBands 10 --GVCFGQBands 11 --GVCFGQBands 12 --GVCFGQBands 13 --GVCFGQBands 14 --GVCFGQBands 15 --GVCFGQBands 16 --GVCFGQBands 17 --GVCFGQBands 18 --GVCFGQBands 19 --GVCFGQBands 20 --GVCFGQBands 21 --GVCFGQBands 22 --GVCFGQBands 23 --GVCFGQBands 24 --GVCFGQBands 25 --GVCFGQBands 26 --GVCFGQBands 27 --GVCFGQBands 28 --GVCFGQBands 29 --GVCFGQBands 30 --GVCFGQBands 31 --GVCFGQBands 32 --GVCFGQBands 33 --GVCFGQBands 34 --GVCFGQBands 35 --GVCFGQBands 36 --GVCFGQBands 37 --GVCFGQBands 38 --GVCFGQBands 39 --GVCFGQBands 40 --GVCFGQBands 41 --GVCFGQBands 42 --GVCFGQBands 43 --GVCFGQBands 44 --GVCFGQBands 45 --GVCFGQBands 46 --GVCFGQBands 47 --GVCFGQBands 48 --GVCFGQBands 49 --GVCFGQBands 50 --GVCFGQBands 51 --GVCFGQBands 52 --GVCFGQBands 53 --GVCFGQBands 54 --GVCFGQBands 55 --GVCFGQBands 56 --GVCFGQBands 57 --GVCFGQBands 58 --GVCFGQBands 59 --GVCFGQBands 60 --GVCFGQBands 70 --GVCFGQBands 80 --GVCFGQBands 90 --GVCFGQBands 99 --indelSizeToEliminateInRefModel 10 --useAllelesTrigger false --dontTrimActiveRegions false --maxDiscARExtension 25 --maxGGAARExtension 300 --paddingAroundIndels 150 --paddingAroundSNPs 20 --kmerSize 10 --kmerSize 25 --dontIncreaseKmerSizesForCycles false --allowNonUniqueKmersInRef false --numPruningSamples 1 --recoverDanglingHeads false --doNotRecoverDanglingBranches false --minDanglingBranchLength 4 --consensus false --maxNumHaplotypesInPopulation 128 --errorCorrectKmers false --minPruning 2 --debugGraphTransformations false --kmerLengthForReadErrorCorrection 25 --minObservationsForKmerToBeSolid 20 --likelihoodCalculationEngine PairHMM --base_quality_score_threshold 18 --gcpHMM 10 --pair_hmm_implementation FASTEST_AVAILABLE --pcr_indel_model CONSERVATIVE --phredScaledGlobalReadMismappingRate 45 --nativePairHmmThreads 4 --useDoublePrecision false --debug false --useFilteredReadsForAnnotations false --emitRefConfidence NONE --bamWriterType CALLED_HAPLOTYPES --disableOptimizations false --justDetermineActiveRegions false --dontGenotype false --dontUseSoftClippedBases false --captureAssemblyFailureBAM false --errorCorrectReads false --doNotRunPhysicalPhasing false --min_base_quality_score 10 --useNewAFCalculator false --annotateNDA false --heterozygosity 0.001 --indel_heterozygosity 1.25E-4 --heterozygosity_stdev 0.01 --standard_min_confidence_threshold_for_calling 10.0 --max_alternate_alleles 6 --max_genotype_count 1024 --sample_ploidy 2 --genotyping_mode DISCOVERY --contamination_fraction_to_filter 0.0 --output_mode EMIT_VARIANTS_ONLY --allSitePLs false --readShardSize -1 --readShardPadding 100 --minAssemblyRegionSize 50 --maxAssemblyRegionSize 300 --assemblyRegionPadding 100 --maxReadsPerAlignmentStart 50 --activeProbabilityThreshold 0.002 --maxProbPropagationDistance 50 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20 --disableToolDefaultReadFilters false --minimumMappingQuality 20
[October 6, 2017 2:50:16 PM EDT] Executing as shlee@gsa5.broadinstitute.org on Linux 2.6.32-642.15.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Version: 4.beta.5
14:50:16.818 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
14:50:16.818 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:50:16.818 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:50:16.818 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:50:16.818 INFO  HaplotypeCaller - Deflater: IntelDeflater
14:50:16.818 INFO  HaplotypeCaller - Inflater: IntelInflater
14:50:16.818 INFO  HaplotypeCaller - GCS max retries/reopens: 20
14:50:16.818 INFO  HaplotypeCaller - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
14:50:16.819 INFO  HaplotypeCaller - Initializing engine
14:50:18.950 INFO  IntervalArgumentCollection - Processing 83257441 bp from intervals
14:50:18.965 INFO  HaplotypeCaller - Done initializing engine
14:50:19.021 INFO  HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
14:50:19.280 WARN  PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
14:50:19.481 WARN  PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
14:50:19.776 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/intel/gkl/native/libgkl_utils.so
14:50:19.795 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
14:50:19.847 WARN  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
14:50:19.848 INFO  IntelPairHmm - Available threads: 48
14:50:19.848 INFO  IntelPairHmm - Requested threads: 4
14:50:19.848 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
14:50:19.926 INFO  ProgressMeter - Starting traversal
14:50:19.926 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Regions Processed   Regions/Minute
14:50:30.309 INFO  ProgressMeter -         chr17:740224              0.2                  3010          17395.5
14:50:41.016 INFO  ProgressMeter -        chr17:1675683              0.4                  7020          19973.4
14:50:51.041 INFO  ProgressMeter -        chr17:2415218              0.5                 10100          19477.4
14:51:01.041 INFO  ProgressMeter -        chr17:3591332              0.7                 14920          21773.6
14:51:11.059 INFO  ProgressMeter -        chr17:4574538              0.9                 19100          22412.6
14:51:21.089 INFO  ProgressMeter -        chr17:5381890              1.0                 22460          22033.3
14:51:31.097 INFO  ProgressMeter -        chr17:6474462              1.2                 27070          22821.4
14:51:41.535 INFO  ProgressMeter -        chr17:7455949              1.4                 31150          22902.4
14:51:51.542 INFO  ProgressMeter -        chr17:8073825              1.5                 33820          22149.2
14:52:01.549 INFO  ProgressMeter -        chr17:9138632              1.7                 38220          22566.0
14:52:11.962 INFO  ProgressMeter -       chr17:10514361              1.9                 43840          23478.4
14:52:21.975 INFO  ProgressMeter -       chr17:11679575              2.0                 48560          23872.6
14:52:31.976 INFO  ProgressMeter -       chr17:13199399              2.2                 54680          24845.5
14:52:41.979 INFO  ProgressMeter -       chr17:15605076              2.4                 64200          27116.8
14:52:51.989 INFO  ProgressMeter -       chr17:16919249              2.5                 69490          27419.1
14:53:02.038 INFO  ProgressMeter -       chr17:18491155              2.7                 75790          28051.2
14:53:12.077 INFO  ProgressMeter -       chr17:19909866              2.9                 81350          28353.2
14:53:22.799 INFO  ProgressMeter -       chr17:21302399              3.0                 86850          28495.3
14:53:32.940 INFO  ProgressMeter -       chr17:22126918              3.2                 90550          28148.4
14:53:42.951 INFO  ProgressMeter -       chr17:26982638              3.4                107470          31760.8
14:53:52.961 INFO  ProgressMeter -       chr17:28688868              3.6                114190          32161.2
...
/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-launch HaplotypeCaller \
>     -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa \
>     -I HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram \
>     -O HG00190_cram_HC.g.vcf \
>     -L chr17:1-100,000 \
>     -ERC GVCF 

Finishes successfully.

NeginValizadegan commented 1 year ago

I'm having this issue with cram files using several Picard tools versions including picard/2.10.1 when I use CollectInsertSizeMetrics. The reference is correct and works with samtools. Any ideas?

lbergelson commented 1 year ago

@NeginValizadegan If you really mean picard 2.10.1 then you should definitely upgrade. That version is ancient. The current is 2.27.5

NeginValizadegan commented 1 year ago

Yes our cluster had this version. I asked them to upgrade it to 2.27.5 and tried that one. Still same error at the same exact chromosome location:

ERROR 2023-02-01 16:32:52 Slice Reference MD5 mismatch for slice 0:248735807-248774361, AAGATTTGTG...AAGTCTAAGC [Wed Feb 01 16:32:52 CST 2023] picard.analysis.CollectInsertSizeMetrics done. Elapsed time: 4.70 minutes. Runtime.totalMemory()=5790236672 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.cram.CRAMException: Reference sequence MD5 mismatch for slice: sequence id 0, start 248735807, span 38555, expected MD5 51a4da6ed30bd549660b19193faae437 at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:187) at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:261) at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:592) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:129) at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:77) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)

lbergelson commented 1 year ago

@NeginValizadegan Thanks for looking into it, and sorry you're still seeing the issue.

Is that error message from the 2.10.1 version? It doesn't line up with the code in 2.27.1 so either you're posting your original error, or you're not getting the version of picard you expect. Would it be possible to include the entire program log, and the command line to run it here?

NeginValizadegan commented 1 year ago

@lbergelson Yes the error was from a previous version as I didn't have the new error recorded. I ran it again and here is the code and error. Just for clarity, I simplified the code paste here, removed long paths and stuff but some of those are shown in the error. Thanks!

java -jar /home/apps/software/picard/2.27.5-Java-1.8.0_201/picard.jar CollectInsertSizeMetrics \ I=HG03125.final.cram \ O=insertSize_metrics.txt \ H=insertSize_hist.pdf \ R=GRCh38_full_analysis_set_plus_decoy_hla.fa \ M=0.5

[Thu Feb 02 13:05:04 CST 2023] picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=results/bowtie2/assembly/Read-Prep/Bowtie/Aligned/stats/HG03125_insertSize_hist.pdf MINIMUM_PCT=0.5 INPUT=./crams/AFR/HG03125.final.cram OUTPUT=results/bowtie2/assembly/Read-Prep/Bowtie/Aligned/stats/HG03125_insertSize_metrics.txt REFERENCE_SEQUENCE=/home/groups/h3abionet/RefGraph/data/genomes/human/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.fa DEVIATIONS=10.0 METRIC_ACCUMULATION_LEVEL=[ALL_READS] INCLUDE_DUPLICATES=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Thu Feb 02 13:05:04 CST 2023] Executing as valizad2@compute-5-1 on Linux 4.18.0-348.23.1.el8_5.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09; Picard version: 2.9.0-1-gf5b9f50-SNAPSHOT INFO 2023-02-02 13:05:15 SinglePassSamProgram Processed 1,000,000 records. Elapsed time: 00:00:11s. Time for last 1,000,000: 6s. Last read position: chr1:3,185,445 INFO 2023-02-02 13:05:20 SinglePassSamProgram Processed 2,000,000 records. Elapsed time: 00:00:16s. Time for last 1,000,000: 5s. Last read position: chr1:6,814,058 INFO 2023-02-02 13:05:25 SinglePassSamProgram Processed 3,000,000 records. Elapsed time: 00:00:21s. Time for last 1,000,000: 4s. Last read position: chr1:10,463,599 INFO 2023-02-02 13:05:30 SinglePassSamProgram Processed 4,000,000 records. Elapsed time: 00:00:26s. Time for last 1,000,000: 4s. Last read position: chr1:14,254,980 INFO 2023-02-02 13:05:35 SinglePassSamProgram Processed 5,000,000 records. Elapsed time: 00:00:31s. Time for last 1,000,000: 5s. Last read position: chr1:17,575,502 INFO 2023-02-02 13:05:40 SinglePassSamProgram Processed 6,000,000 records. Elapsed time: 00:00:36s. Time for last 1,000,000: 4s. Last read position: chr1:21,322,246 INFO 2023-02-02 13:05:45 SinglePassSamProgram Processed 7,000,000 records. Elapsed time: 00:00:40s. Time for last 1,000,000: 4s. Last read position: chr1:24,924,908 INFO 2023-02-02 13:05:50 SinglePassSamProgram Processed 8,000,000 records. Elapsed time: 00:00:45s. Time for last 1,000,000: 4s. Last read position: chr1:28,499,322 INFO 2023-02-02 13:05:54 SinglePassSamProgram Processed 9,000,000 records. Elapsed time: 00:00:50s. Time for last 1,000,000: 4s. Last read position: chr1:32,183,454 INFO 2023-02-02 13:05:59 SinglePassSamProgram Processed 10,000,000 records. Elapsed time: 00:00:55s. Time for last 1,000,000: 4s. Last read position: chr1:35,986,511 INFO 2023-02-02 13:06:04 SinglePassSamProgram Processed 11,000,000 records. Elapsed time: 00:01:00s. Time for last 1,000,000: 4s. Last read position: chr1:39,695,439 INFO 2023-02-02 13:06:09 SinglePassSamProgram Processed 12,000,000 records. Elapsed time: 00:01:04s. Time for last 1,000,000: 4s. Last read position: chr1:43,425,445 INFO 2023-02-02 13:06:13 SinglePassSamProgram Processed 13,000,000 records. Elapsed time: 00:01:09s. Time for last 1,000,000: 4s. Last read position: chr1:47,108,648 INFO 2023-02-02 13:06:18 SinglePassSamProgram Processed 14,000,000 records. Elapsed time: 00:01:14s. Time for last 1,000,000: 4s. Last read position: chr1:51,168,387 INFO 2023-02-02 13:06:23 SinglePassSamProgram Processed 15,000,000 records. Elapsed time: 00:01:18s. Time for last 1,000,000: 4s. Last read position: chr1:54,875,958 INFO 2023-02-02 13:06:27 SinglePassSamProgram Processed 16,000,000 records. Elapsed time: 00:01:23s. Time for last 1,000,000: 4s. Last read position: chr1:58,975,239 INFO 2023-02-02 13:06:32 SinglePassSamProgram Processed 17,000,000 records. Elapsed time: 00:01:27s. Time for last 1,000,000: 4s. Last read position: chr1:63,105,196 INFO 2023-02-02 13:06:37 SinglePassSamProgram Processed 18,000,000 records. Elapsed time: 00:01:32s. Time for last 1,000,000: 4s. Last read position: chr1:67,124,618 INFO 2023-02-02 13:06:41 SinglePassSamProgram Processed 19,000,000 records. Elapsed time: 00:01:37s. Time for last 1,000,000: 4s. Last read position: chr1:71,272,854 INFO 2023-02-02 13:06:46 SinglePassSamProgram Processed 20,000,000 records. Elapsed time: 00:01:41s. Time for last 1,000,000: 4s. Last read position: chr1:75,682,682 INFO 2023-02-02 13:06:50 SinglePassSamProgram Processed 21,000,000 records. Elapsed time: 00:01:46s. Time for last 1,000,000: 4s. Last read position: chr1:79,793,726 INFO 2023-02-02 13:06:55 SinglePassSamProgram Processed 22,000,000 records. Elapsed time: 00:01:51s. Time for last 1,000,000: 4s. Last read position: chr1:84,134,824 INFO 2023-02-02 13:07:00 SinglePassSamProgram Processed 23,000,000 records. Elapsed time: 00:01:55s. Time for last 1,000,000: 4s. Last read position: chr1:88,180,351 INFO 2023-02-02 13:07:04 SinglePassSamProgram Processed 24,000,000 records. Elapsed time: 00:02:00s. Time for last 1,000,000: 4s. Last read position: chr1:92,181,993 INFO 2023-02-02 13:07:09 SinglePassSamProgram Processed 25,000,000 records. Elapsed time: 00:02:05s. Time for last 1,000,000: 4s. Last read position: chr1:96,170,475 INFO 2023-02-02 13:07:14 SinglePassSamProgram Processed 26,000,000 records. Elapsed time: 00:02:09s. Time for last 1,000,000: 4s. Last read position: chr1:100,366,319 INFO 2023-02-02 13:07:18 SinglePassSamProgram Processed 27,000,000 records. Elapsed time: 00:02:14s. Time for last 1,000,000: 4s. Last read position: chr1:104,482,942 INFO 2023-02-02 13:07:23 SinglePassSamProgram Processed 28,000,000 records. Elapsed time: 00:02:19s. Time for last 1,000,000: 4s. Last read position: chr1:108,742,994 INFO 2023-02-02 13:07:28 SinglePassSamProgram Processed 29,000,000 records. Elapsed time: 00:02:23s. Time for last 1,000,000: 4s. Last read position: chr1:112,497,950 INFO 2023-02-02 13:07:33 SinglePassSamProgram Processed 30,000,000 records. Elapsed time: 00:02:28s. Time for last 1,000,000: 4s. Last read position: chr1:116,441,009 INFO 2023-02-02 13:07:37 SinglePassSamProgram Processed 31,000,000 records. Elapsed time: 00:02:33s. Time for last 1,000,000: 4s. Last read position: chr1:120,502,290 INFO 2023-02-02 13:07:43 SinglePassSamProgram Processed 32,000,000 records. Elapsed time: 00:02:39s. Time for last 1,000,000: 5s. Last read position: chr1:123,027,314 INFO 2023-02-02 13:07:49 SinglePassSamProgram Processed 33,000,000 records. Elapsed time: 00:02:45s. Time for last 1,000,000: 5s. Last read position: chr1:124,048,686 INFO 2023-02-02 13:07:55 SinglePassSamProgram Processed 34,000,000 records. Elapsed time: 00:02:51s. Time for last 1,000,000: 5s. Last read position: chr1:124,736,341 INFO 2023-02-02 13:08:01 SinglePassSamProgram Processed 35,000,000 records. Elapsed time: 00:02:57s. Time for last 1,000,000: 6s. Last read position: chr1:143,192,729 INFO 2023-02-02 13:08:08 SinglePassSamProgram Processed 36,000,000 records. Elapsed time: 00:03:03s. Time for last 1,000,000: 6s. Last read position: chr1:143,252,697 INFO 2023-02-02 13:08:14 SinglePassSamProgram Processed 37,000,000 records. Elapsed time: 00:03:09s. Time for last 1,000,000: 6s. Last read position: chr1:146,059,513 INFO 2023-02-02 13:08:19 SinglePassSamProgram Processed 38,000,000 records. Elapsed time: 00:03:15s. Time for last 1,000,000: 5s. Last read position: chr1:149,868,216 INFO 2023-02-02 13:08:24 SinglePassSamProgram Processed 39,000,000 records. Elapsed time: 00:03:20s. Time for last 1,000,000: 4s. Last read position: chr1:153,658,464 INFO 2023-02-02 13:08:29 SinglePassSamProgram Processed 40,000,000 records. Elapsed time: 00:03:24s. Time for last 1,000,000: 4s. Last read position: chr1:157,187,828 INFO 2023-02-02 13:08:34 SinglePassSamProgram Processed 41,000,000 records. Elapsed time: 00:03:29s. Time for last 1,000,000: 4s. Last read position: chr1:161,081,746 INFO 2023-02-02 13:08:38 SinglePassSamProgram Processed 42,000,000 records. Elapsed time: 00:03:34s. Time for last 1,000,000: 4s. Last read position: chr1:165,016,385 INFO 2023-02-02 13:08:43 SinglePassSamProgram Processed 43,000,000 records. Elapsed time: 00:03:39s. Time for last 1,000,000: 4s. Last read position: chr1:168,968,109 INFO 2023-02-02 13:08:48 SinglePassSamProgram Processed 44,000,000 records. Elapsed time: 00:03:43s. Time for last 1,000,000: 4s. Last read position: chr1:172,988,366 INFO 2023-02-02 13:08:52 SinglePassSamProgram Processed 45,000,000 records. Elapsed time: 00:03:48s. Time for last 1,000,000: 4s. Last read position: chr1:176,949,619 INFO 2023-02-02 13:08:57 SinglePassSamProgram Processed 46,000,000 records. Elapsed time: 00:03:53s. Time for last 1,000,000: 4s. Last read position: chr1:180,949,338 INFO 2023-02-02 13:09:02 SinglePassSamProgram Processed 47,000,000 records. Elapsed time: 00:03:57s. Time for last 1,000,000: 4s. Last read position: chr1:184,807,396 INFO 2023-02-02 13:09:06 SinglePassSamProgram Processed 48,000,000 records. Elapsed time: 00:04:02s. Time for last 1,000,000: 4s. Last read position: chr1:189,053,872 INFO 2023-02-02 13:09:11 SinglePassSamProgram Processed 49,000,000 records. Elapsed time: 00:04:07s. Time for last 1,000,000: 4s. Last read position: chr1:193,313,533 INFO 2023-02-02 13:09:16 SinglePassSamProgram Processed 50,000,000 records. Elapsed time: 00:04:11s. Time for last 1,000,000: 4s. Last read position: chr1:197,722,580 INFO 2023-02-02 13:09:20 SinglePassSamProgram Processed 51,000,000 records. Elapsed time: 00:04:16s. Time for last 1,000,000: 4s. Last read position: chr1:201,594,892 INFO 2023-02-02 13:09:25 SinglePassSamProgram Processed 52,000,000 records. Elapsed time: 00:04:21s. Time for last 1,000,000: 4s. Last read position: chr1:205,296,228 INFO 2023-02-02 13:09:30 SinglePassSamProgram Processed 53,000,000 records. Elapsed time: 00:04:26s. Time for last 1,000,000: 4s. Last read position: chr1:209,114,496 INFO 2023-02-02 13:09:35 SinglePassSamProgram Processed 54,000,000 records. Elapsed time: 00:04:30s. Time for last 1,000,000: 4s. Last read position: chr1:212,990,736 INFO 2023-02-02 13:09:39 SinglePassSamProgram Processed 55,000,000 records. Elapsed time: 00:04:35s. Time for last 1,000,000: 4s. Last read position: chr1:217,217,647 INFO 2023-02-02 13:09:44 SinglePassSamProgram Processed 56,000,000 records. Elapsed time: 00:04:40s. Time for last 1,000,000: 4s. Last read position: chr1:221,387,522 INFO 2023-02-02 13:09:49 SinglePassSamProgram Processed 57,000,000 records. Elapsed time: 00:04:45s. Time for last 1,000,000: 5s. Last read position: chr1:225,314,304 INFO 2023-02-02 13:09:54 SinglePassSamProgram Processed 58,000,000 records. Elapsed time: 00:04:49s. Time for last 1,000,000: 4s. Last read position: chr1:228,891,181 INFO 2023-02-02 13:09:58 SinglePassSamProgram Processed 59,000,000 records. Elapsed time: 00:04:54s. Time for last 1,000,000: 4s. Last read position: chr1:232,844,624 INFO 2023-02-02 13:10:03 SinglePassSamProgram Processed 60,000,000 records. Elapsed time: 00:04:59s. Time for last 1,000,000: 4s. Last read position: chr1:236,642,862 INFO 2023-02-02 13:10:08 SinglePassSamProgram Processed 61,000,000 records. Elapsed time: 00:05:03s. Time for last 1,000,000: 4s. Last read position: chr1:241,003,784 INFO 2023-02-02 13:10:13 SinglePassSamProgram Processed 62,000,000 records. Elapsed time: 00:05:08s. Time for last 1,000,000: 5s. Last read position: chr1:245,019,923 ERROR 2023-02-02 13:10:18 Slice Reference MD5 mismatch for slice 0:248741017-248780008, GATGGGAGAT...AGGGAAATAT [Thu Feb 02 13:10:18 CST 2023] picard.analysis.CollectInsertSizeMetrics done. Elapsed time: 5.23 minutes. Runtime.totalMemory()=5051514880 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.cram.CRAMException: Reference sequence MD5 mismatch for slice: sequence id 0, start 248741017, span 38992, expected MD5 37a71701c8b7513578af280cdbcc4bda at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:187) at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:261) at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:592) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:129) at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:77) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)

NeginValizadegan commented 1 year ago

The previous error was for another sample too. The was the only one I had recorded. That's why the chr position are not exactly the same

cmnbroad commented 1 year ago

@NeginValizadegan For some reason you appear to be running very old Picard versions. This most recent error message you posted is coming from an even older version of Picard (2.9.0-1-gf5b9f50-SNAPSHOT) than the first one you posted (which you said was 2.10.1). I would suggest upgrading to 2.27.5.

NeginValizadegan commented 1 year ago

hmm strange. How do you know which version it is? It might be helpful for me to know that for troubleshooting.

NeginValizadegan commented 1 year ago

oh right I just saw it. Strange. I will follow up with our IT to see why this discrepancy exist.

NeginValizadegan commented 1 year ago

Okay apparently there is not a version number in the picard.jar file downloaded from https://github.com/broadinstitute/picard/releases and thus if easybuild detects a cache copy, it will use that instead of downloading it which was an older version. They forced it to download and now version is correct. I will re-try and see if the error persists. Thanks for catching that.

NeginValizadegan commented 1 year ago

That seem to have helped. Thanks so much!

lbergelson commented 1 year ago

👍