Closed sooheelee closed 6 years ago
@cmnbroad Can you please have a look?
@sooheelee Since samtools can decode it, it sounds exactly like https://github.com/samtools/htsjdk/issues/800, for which we have a two part fix (this and this) that is very close to being ready.
Ok. Very excited to have the feature work.
@sooheelee Are there ambiguity codes in the reads/reference near the locus where it fails?
I haven't checked @droazen. Is this important for your work?
Sorry for the delay in answering. I've been busy with the workshop somatic CNV tutorial writing.
Fix is merged in htsjdk now https://github.com/samtools/htsjdk/pull/889. I'll keep this open until we update GATK with a new htsjdk.
@Sooheelee This should be fixed and I'd like to close it - can you verify that it works for your case when you get a chance ?
Just back from workshop/vacay. I will take a look.
Hmm, looks like I deleted the CRAM /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/cram/HG02759.alt_bwamem_GRCh38DH.20150826.GWD.exome.cram
. (I was a high storage cost offender and asked to clean up.) Luckily, I have another CRAM available that I left for testing purposes. I can recapitulate the error with this other CRAM using the old gatk-4.alpha.2-1134-ga9d9d91-SNAPSHOT
jar.
-bash-4.1$ /humgen/gsa-hpprojects/GATK/gatk4/gatk-4.alpha.2-1134-ga9d9d91-SNAPSHOT/gatk-launch PrintReads -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa -I /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram -O /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/test_decram_20171002.bam
Gives the same error at an approximately similar region:
...
10:50:54.603 INFO ProgressMeter - chr1:224042054 2.2 11004000 5016945.0
10:51:04.609 INFO ProgressMeter - chr1:248061327 2.4 11905000 5044206.5
ERROR 2017-10-02 10:51:05 Slice Reference MD5 mismatch for slice 0:248574592-248771907, AGTGGATGAG...TGTCGGTATG
10:51:06.940 INFO PrintReads - Shutting down engine
[October 2, 2017 10:51:06 AM EDT] org.broadinstitute.hellbender.tools.PrintReads done. Elapsed time: 2.45 minutes.
Runtime.totalMemory()=5995233280
htsjdk.samtools.cram.CRAMException: Reference sequence MD5 mismatch for slice: sequence id 0, start 248574592, span 197316, expected MD5 cc8ace0545facc11349da783af07a076
at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:187)
at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:261)
at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:601)
at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.hasNext(SAMRecordToReadIterator.java:24)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:779)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:115)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:170)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:189)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:122)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:143)
at org.broadinstitute.hellbender.Main.main(Main.java:221)
Now using the latest jar gatk-4.beta.5
in the same command:
-bash-4.1$ /humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-launch PrintReads -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa -I /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram -O /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/test_decram_20171002.bam
I see that the run goes past the problem region.
...
10:56:39.570 INFO ProgressMeter - chr1:179846959 2.2 9271000 4228249.1
10:56:49.572 INFO ProgressMeter - chr1:198702508 2.4 9889000 4191438.3
10:56:59.631 INFO ProgressMeter - chr1:219611298 2.5 10821000 4282152.8
10:57:09.660 INFO ProgressMeter - chr1:242292327 2.7 11691000 4339428.9
10:57:19.665 INFO ProgressMeter - chr2:6849819 2.9 12102000 4230137.4
10:57:29.667 INFO ProgressMeter - chr2:26448455 3.0 12672000 4185516.5
10:57:39.696 INFO ProgressMeter - chr2:48688700 3.2 13582000 4251349.9
10:57:49.698 INFO ProgressMeter - chr2:74009861 3.4 14355000 4270478.5
...
I'll let this finish (and post again if there is an error), but I suspect the issue is solved. Thank you @cmnbroad for the fix.
All good:
11:15:37.269 INFO ProgressMeter - chrX:102154117 21.2 105064000 4966574.8
11:15:47.308 INFO ProgressMeter - chrX:148052583 21.3 105984000 4970730.0
11:15:57.311 INFO ProgressMeter - chr1_KI270711v1_random:8733 21.5 106809000 4970557.7
11:16:07.348 INFO ProgressMeter - chr1_KI270763v1_alt:521207 21.7 107601000 4968734.1
11:16:17.392 INFO ProgressMeter - chr7_KI270803v1_alt:794832 21.8 108422000 4968240.7
11:16:27.410 INFO ProgressMeter - chr16_KI270853v1_alt:1992946 22.0 109486000 4978907.0
11:16:37.438 INFO ProgressMeter - chr19_KI270885v1_alt:18015 22.2 110571000 4990315.4
11:16:47.479 INFO ProgressMeter - chr6_GL000251v2_alt:3288752 22.3 111608000 4999358.0
11:16:57.479 INFO ProgressMeter - chr19_KI270915v1_alt:116858 22.5 112656000 5008907.2
11:17:07.482 INFO ProgressMeter - chr6_GL000252v2_alt:3648002 22.7 113732000 5019540.7
11:17:17.485 INFO ProgressMeter - chr19_GL000209v2_alt:107854 22.8 114378000 5011183.1
11:17:27.485 INFO ProgressMeter - chr6_GL000254v2_alt:4706141 23.0 115453000 5021609.7
11:17:37.489 INFO ProgressMeter - chr6_GL000256v2_alt:3229060 23.2 116516000 5031360.7
11:17:47.495 INFO ProgressMeter - chrUn_KN707963v1_decoy:59481 23.3 117352000 5031225.8
11:17:57.500 INFO ProgressMeter - HLA-A*02:53N:2725 23.5 117662000 5008712.4
11:18:02.114 INFO PrintReads - No reads filtered by: WellformedReadFilter
11:18:02.114 INFO ProgressMeter - unmapped 23.6 118064258 5009433.9
11:18:02.114 INFO ProgressMeter - Traversal complete. Processed 118064258 total reads in 23.6 minutes.
11:18:05.615 INFO PrintReads - Shutting down engine
[October 2, 2017 11:18:05 AM EDT] org.broadinstitute.hellbender.tools.PrintReads done. Elapsed time: 23.67 minutes.
Runtime.totalMemory()=11493965824
And I can view the BAM with Samtools:
samtools view /humgen/gsa-hpprojects/dev/shlee/1kg_GRCh38_exome/test_decram_20171002.bam | less
SRR070790.20928984 99 chr1 10004 0 3S97M = 10036 116 AACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC <<<<<BFBBBBBFBBBBBBBBBBBFBBBBBFBBBBBBBBBBBBBBBBBBBBBBBBBBBBBFBBBBBBBBB7B<BBBB<F<B<<BBBB<<BFB<770<<<7 MC:Z:61M1D22M17S RG:Z:SRR070790 MQ:i:0 AS:i:97 XS:i:100
SRR070503.24638125 145 chr1 10008 0 100M chr6 8088710 0 AGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC ''''''''''''BBB<<<<7<7<<BBBBFBBBFFBBBBBBFBFBBBFBBBBBFBFBBBFBFBBBF<BBBBFBBBBBFBBBBBFBBBBBFBBBBBB<<<<< RG:Z:SRR070503 MQ:i:60 AS:i:98 XS:i:98
SRR070790.26540140 65 chr1 10016 0 20M3D9M3D13M1I57M chr4 190122739 0 CCCTAACCCTAACCACAACCACCCTAACCACCCT
...
@sooheelee Thanks for digging that up and doing that verification.
So at this point are we free to run GATK tools on CRAMs without reverting to BAMs first?
No problem Chris. I'm going to assume we can run on CRAMs any tool that takes a BAM. I will be testing this and if I find otherwise, I'll let the team know.
@ldgauthier @sooheelee AFAIK, this was the main blocking issue. One caveat though - last week we replaced all of the copies of the Picard tools in GATK with the ACTUAL tools from the Picard jar. The GATK copies were mostly CRAM enabled and tested, but I think the Picard ones have various mixed states wrt read/write capabilities for CRAM.
Supposing I wanted to re-run HaplotypeCaller from the CRAM, is that a tool that's been tested with CRAM?
Related issue https://github.com/broadinstitute/gatk/issues/3669 from testing CRAM usage.
I doubt much CRAM testing has been done with HaplotypeCaller, and I don't see any integration tests for it using CRAM. FWIW, since we have a CRAM version of the BAM we use for those tests, I did just try running them on the CRAM and they passed. As SooHee mentioned though, you'll have issues if the reference you use is in a GCS bucket, or if you specify intervals and the CRAM (and it's index) are in a bucket.
Yes, I have also just tested HaplotypeCaller on a CRAM and it seems to run fine locally.
/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-launch HaplotypeCaller \
-R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa \
-I HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram \
-O HG00190_cram_HC.vcf \
-L chr17
gives
Using GATK jar /humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Dsnappy.disable=true -jar /humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar HaplotypeCaller -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa -I HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram -O HG00190_cram_HC.vcf -L chr17
14:50:16.528 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/intel/gkl/native/libgkl_compression.so
[October 6, 2017 2:50:16 PM EDT] HaplotypeCaller --output HG00190_cram_HC.vcf --intervals chr17 --input HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram --reference /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa --group StandardAnnotation --group StandardHCAnnotation --GVCFGQBands 1 --GVCFGQBands 2 --GVCFGQBands 3 --GVCFGQBands 4 --GVCFGQBands 5 --GVCFGQBands 6 --GVCFGQBands 7 --GVCFGQBands 8 --GVCFGQBands 9 --GVCFGQBands 10 --GVCFGQBands 11 --GVCFGQBands 12 --GVCFGQBands 13 --GVCFGQBands 14 --GVCFGQBands 15 --GVCFGQBands 16 --GVCFGQBands 17 --GVCFGQBands 18 --GVCFGQBands 19 --GVCFGQBands 20 --GVCFGQBands 21 --GVCFGQBands 22 --GVCFGQBands 23 --GVCFGQBands 24 --GVCFGQBands 25 --GVCFGQBands 26 --GVCFGQBands 27 --GVCFGQBands 28 --GVCFGQBands 29 --GVCFGQBands 30 --GVCFGQBands 31 --GVCFGQBands 32 --GVCFGQBands 33 --GVCFGQBands 34 --GVCFGQBands 35 --GVCFGQBands 36 --GVCFGQBands 37 --GVCFGQBands 38 --GVCFGQBands 39 --GVCFGQBands 40 --GVCFGQBands 41 --GVCFGQBands 42 --GVCFGQBands 43 --GVCFGQBands 44 --GVCFGQBands 45 --GVCFGQBands 46 --GVCFGQBands 47 --GVCFGQBands 48 --GVCFGQBands 49 --GVCFGQBands 50 --GVCFGQBands 51 --GVCFGQBands 52 --GVCFGQBands 53 --GVCFGQBands 54 --GVCFGQBands 55 --GVCFGQBands 56 --GVCFGQBands 57 --GVCFGQBands 58 --GVCFGQBands 59 --GVCFGQBands 60 --GVCFGQBands 70 --GVCFGQBands 80 --GVCFGQBands 90 --GVCFGQBands 99 --indelSizeToEliminateInRefModel 10 --useAllelesTrigger false --dontTrimActiveRegions false --maxDiscARExtension 25 --maxGGAARExtension 300 --paddingAroundIndels 150 --paddingAroundSNPs 20 --kmerSize 10 --kmerSize 25 --dontIncreaseKmerSizesForCycles false --allowNonUniqueKmersInRef false --numPruningSamples 1 --recoverDanglingHeads false --doNotRecoverDanglingBranches false --minDanglingBranchLength 4 --consensus false --maxNumHaplotypesInPopulation 128 --errorCorrectKmers false --minPruning 2 --debugGraphTransformations false --kmerLengthForReadErrorCorrection 25 --minObservationsForKmerToBeSolid 20 --likelihoodCalculationEngine PairHMM --base_quality_score_threshold 18 --gcpHMM 10 --pair_hmm_implementation FASTEST_AVAILABLE --pcr_indel_model CONSERVATIVE --phredScaledGlobalReadMismappingRate 45 --nativePairHmmThreads 4 --useDoublePrecision false --debug false --useFilteredReadsForAnnotations false --emitRefConfidence NONE --bamWriterType CALLED_HAPLOTYPES --disableOptimizations false --justDetermineActiveRegions false --dontGenotype false --dontUseSoftClippedBases false --captureAssemblyFailureBAM false --errorCorrectReads false --doNotRunPhysicalPhasing false --min_base_quality_score 10 --useNewAFCalculator false --annotateNDA false --heterozygosity 0.001 --indel_heterozygosity 1.25E-4 --heterozygosity_stdev 0.01 --standard_min_confidence_threshold_for_calling 10.0 --max_alternate_alleles 6 --max_genotype_count 1024 --sample_ploidy 2 --genotyping_mode DISCOVERY --contamination_fraction_to_filter 0.0 --output_mode EMIT_VARIANTS_ONLY --allSitePLs false --readShardSize -1 --readShardPadding 100 --minAssemblyRegionSize 50 --maxAssemblyRegionSize 300 --assemblyRegionPadding 100 --maxReadsPerAlignmentStart 50 --activeProbabilityThreshold 0.002 --maxProbPropagationDistance 50 --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --readValidationStringency SILENT --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation false --createOutputBamIndex true --createOutputBamMD5 false --createOutputVariantIndex true --createOutputVariantMD5 false --lenient false --addOutputSAMProgramRecord true --addOutputVCFCommandLine true --cloudPrefetchBuffer 40 --cloudIndexPrefetchBuffer -1 --disableBamIndexCaching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20 --disableToolDefaultReadFilters false --minimumMappingQuality 20
[October 6, 2017 2:50:16 PM EDT] Executing as shlee@gsa5.broadinstitute.org on Linux 2.6.32-642.15.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Version: 4.beta.5
14:50:16.818 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
14:50:16.818 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:50:16.818 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:50:16.818 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:50:16.818 INFO HaplotypeCaller - Deflater: IntelDeflater
14:50:16.818 INFO HaplotypeCaller - Inflater: IntelInflater
14:50:16.818 INFO HaplotypeCaller - GCS max retries/reopens: 20
14:50:16.818 INFO HaplotypeCaller - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
14:50:16.819 INFO HaplotypeCaller - Initializing engine
14:50:18.950 INFO IntervalArgumentCollection - Processing 83257441 bp from intervals
14:50:18.965 INFO HaplotypeCaller - Done initializing engine
14:50:19.021 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
14:50:19.280 WARN PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
14:50:19.481 WARN PossibleDeNovo - Annotation will not be calculated, must provide a valid PED file (-ped) from the command line.
14:50:19.776 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/intel/gkl/native/libgkl_utils.so
14:50:19.795 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-package-4.beta.5-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
14:50:19.847 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
14:50:19.848 INFO IntelPairHmm - Available threads: 48
14:50:19.848 INFO IntelPairHmm - Requested threads: 4
14:50:19.848 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
14:50:19.926 INFO ProgressMeter - Starting traversal
14:50:19.926 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
14:50:30.309 INFO ProgressMeter - chr17:740224 0.2 3010 17395.5
14:50:41.016 INFO ProgressMeter - chr17:1675683 0.4 7020 19973.4
14:50:51.041 INFO ProgressMeter - chr17:2415218 0.5 10100 19477.4
14:51:01.041 INFO ProgressMeter - chr17:3591332 0.7 14920 21773.6
14:51:11.059 INFO ProgressMeter - chr17:4574538 0.9 19100 22412.6
14:51:21.089 INFO ProgressMeter - chr17:5381890 1.0 22460 22033.3
14:51:31.097 INFO ProgressMeter - chr17:6474462 1.2 27070 22821.4
14:51:41.535 INFO ProgressMeter - chr17:7455949 1.4 31150 22902.4
14:51:51.542 INFO ProgressMeter - chr17:8073825 1.5 33820 22149.2
14:52:01.549 INFO ProgressMeter - chr17:9138632 1.7 38220 22566.0
14:52:11.962 INFO ProgressMeter - chr17:10514361 1.9 43840 23478.4
14:52:21.975 INFO ProgressMeter - chr17:11679575 2.0 48560 23872.6
14:52:31.976 INFO ProgressMeter - chr17:13199399 2.2 54680 24845.5
14:52:41.979 INFO ProgressMeter - chr17:15605076 2.4 64200 27116.8
14:52:51.989 INFO ProgressMeter - chr17:16919249 2.5 69490 27419.1
14:53:02.038 INFO ProgressMeter - chr17:18491155 2.7 75790 28051.2
14:53:12.077 INFO ProgressMeter - chr17:19909866 2.9 81350 28353.2
14:53:22.799 INFO ProgressMeter - chr17:21302399 3.0 86850 28495.3
14:53:32.940 INFO ProgressMeter - chr17:22126918 3.2 90550 28148.4
14:53:42.951 INFO ProgressMeter - chr17:26982638 3.4 107470 31760.8
14:53:52.961 INFO ProgressMeter - chr17:28688868 3.6 114190 32161.2
...
/humgen/gsa-hpprojects/GATK/gatk4/gatk-4.beta.5/gatk-launch HaplotypeCaller \
> -R /humgen/gsa-hpprojects/dev/shlee/ref/GRCh38_1kg/GRCh38_full_analysis_set_plus_decoy_hla.fa \
> -I HG00190.alt_bwamem_GRCh38DH.20150826.FIN.exome.cram \
> -O HG00190_cram_HC.g.vcf \
> -L chr17:1-100,000 \
> -ERC GVCF
Finishes successfully.
I'm having this issue with cram files using several Picard tools versions including picard/2.10.1 when I use CollectInsertSizeMetrics. The reference is correct and works with samtools. Any ideas?
@NeginValizadegan If you really mean picard 2.10.1 then you should definitely upgrade. That version is ancient. The current is 2.27.5
Yes our cluster had this version. I asked them to upgrade it to 2.27.5 and tried that one. Still same error at the same exact chromosome location:
ERROR 2023-02-01 16:32:52 Slice Reference MD5 mismatch for slice 0:248735807-248774361, AAGATTTGTG...AAGTCTAAGC [Wed Feb 01 16:32:52 CST 2023] picard.analysis.CollectInsertSizeMetrics done. Elapsed time: 4.70 minutes. Runtime.totalMemory()=5790236672 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.cram.CRAMException: Reference sequence MD5 mismatch for slice: sequence id 0, start 248735807, span 38555, expected MD5 51a4da6ed30bd549660b19193faae437 at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:187) at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:261) at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:592) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:129) at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:77) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
@NeginValizadegan Thanks for looking into it, and sorry you're still seeing the issue.
Is that error message from the 2.10.1 version? It doesn't line up with the code in 2.27.1 so either you're posting your original error, or you're not getting the version of picard you expect. Would it be possible to include the entire program log, and the command line to run it here?
@lbergelson Yes the error was from a previous version as I didn't have the new error recorded. I ran it again and here is the code and error. Just for clarity, I simplified the code paste here, removed long paths and stuff but some of those are shown in the error. Thanks!
java -jar /home/apps/software/picard/2.27.5-Java-1.8.0_201/picard.jar CollectInsertSizeMetrics \ I=HG03125.final.cram \ O=insertSize_metrics.txt \ H=insertSize_hist.pdf \ R=GRCh38_full_analysis_set_plus_decoy_hla.fa \ M=0.5
[Thu Feb 02 13:05:04 CST 2023] picard.analysis.CollectInsertSizeMetrics HISTOGRAM_FILE=results/bowtie2/assembly/Read-Prep/Bowtie/Aligned/stats/HG03125_insertSize_hist.pdf MINIMUM_PCT=0.5 INPUT=./crams/AFR/HG03125.final.cram OUTPUT=results/bowtie2/assembly/Read-Prep/Bowtie/Aligned/stats/HG03125_insertSize_metrics.txt REFERENCE_SEQUENCE=/home/groups/h3abionet/RefGraph/data/genomes/human/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.fa DEVIATIONS=10.0 METRIC_ACCUMULATION_LEVEL=[ALL_READS] INCLUDE_DUPLICATES=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Thu Feb 02 13:05:04 CST 2023] Executing as valizad2@compute-5-1 on Linux 4.18.0-348.23.1.el8_5.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_201-b09; Picard version: 2.9.0-1-gf5b9f50-SNAPSHOT INFO 2023-02-02 13:05:15 SinglePassSamProgram Processed 1,000,000 records. Elapsed time: 00:00:11s. Time for last 1,000,000: 6s. Last read position: chr1:3,185,445 INFO 2023-02-02 13:05:20 SinglePassSamProgram Processed 2,000,000 records. Elapsed time: 00:00:16s. Time for last 1,000,000: 5s. Last read position: chr1:6,814,058 INFO 2023-02-02 13:05:25 SinglePassSamProgram Processed 3,000,000 records. Elapsed time: 00:00:21s. Time for last 1,000,000: 4s. Last read position: chr1:10,463,599 INFO 2023-02-02 13:05:30 SinglePassSamProgram Processed 4,000,000 records. Elapsed time: 00:00:26s. Time for last 1,000,000: 4s. Last read position: chr1:14,254,980 INFO 2023-02-02 13:05:35 SinglePassSamProgram Processed 5,000,000 records. Elapsed time: 00:00:31s. Time for last 1,000,000: 5s. Last read position: chr1:17,575,502 INFO 2023-02-02 13:05:40 SinglePassSamProgram Processed 6,000,000 records. Elapsed time: 00:00:36s. Time for last 1,000,000: 4s. Last read position: chr1:21,322,246 INFO 2023-02-02 13:05:45 SinglePassSamProgram Processed 7,000,000 records. Elapsed time: 00:00:40s. Time for last 1,000,000: 4s. Last read position: chr1:24,924,908 INFO 2023-02-02 13:05:50 SinglePassSamProgram Processed 8,000,000 records. Elapsed time: 00:00:45s. Time for last 1,000,000: 4s. Last read position: chr1:28,499,322 INFO 2023-02-02 13:05:54 SinglePassSamProgram Processed 9,000,000 records. Elapsed time: 00:00:50s. Time for last 1,000,000: 4s. Last read position: chr1:32,183,454 INFO 2023-02-02 13:05:59 SinglePassSamProgram Processed 10,000,000 records. Elapsed time: 00:00:55s. Time for last 1,000,000: 4s. Last read position: chr1:35,986,511 INFO 2023-02-02 13:06:04 SinglePassSamProgram Processed 11,000,000 records. Elapsed time: 00:01:00s. Time for last 1,000,000: 4s. Last read position: chr1:39,695,439 INFO 2023-02-02 13:06:09 SinglePassSamProgram Processed 12,000,000 records. Elapsed time: 00:01:04s. Time for last 1,000,000: 4s. Last read position: chr1:43,425,445 INFO 2023-02-02 13:06:13 SinglePassSamProgram Processed 13,000,000 records. Elapsed time: 00:01:09s. Time for last 1,000,000: 4s. Last read position: chr1:47,108,648 INFO 2023-02-02 13:06:18 SinglePassSamProgram Processed 14,000,000 records. Elapsed time: 00:01:14s. Time for last 1,000,000: 4s. Last read position: chr1:51,168,387 INFO 2023-02-02 13:06:23 SinglePassSamProgram Processed 15,000,000 records. Elapsed time: 00:01:18s. Time for last 1,000,000: 4s. Last read position: chr1:54,875,958 INFO 2023-02-02 13:06:27 SinglePassSamProgram Processed 16,000,000 records. Elapsed time: 00:01:23s. Time for last 1,000,000: 4s. Last read position: chr1:58,975,239 INFO 2023-02-02 13:06:32 SinglePassSamProgram Processed 17,000,000 records. Elapsed time: 00:01:27s. Time for last 1,000,000: 4s. Last read position: chr1:63,105,196 INFO 2023-02-02 13:06:37 SinglePassSamProgram Processed 18,000,000 records. Elapsed time: 00:01:32s. Time for last 1,000,000: 4s. Last read position: chr1:67,124,618 INFO 2023-02-02 13:06:41 SinglePassSamProgram Processed 19,000,000 records. Elapsed time: 00:01:37s. Time for last 1,000,000: 4s. Last read position: chr1:71,272,854 INFO 2023-02-02 13:06:46 SinglePassSamProgram Processed 20,000,000 records. Elapsed time: 00:01:41s. Time for last 1,000,000: 4s. Last read position: chr1:75,682,682 INFO 2023-02-02 13:06:50 SinglePassSamProgram Processed 21,000,000 records. Elapsed time: 00:01:46s. Time for last 1,000,000: 4s. Last read position: chr1:79,793,726 INFO 2023-02-02 13:06:55 SinglePassSamProgram Processed 22,000,000 records. Elapsed time: 00:01:51s. Time for last 1,000,000: 4s. Last read position: chr1:84,134,824 INFO 2023-02-02 13:07:00 SinglePassSamProgram Processed 23,000,000 records. Elapsed time: 00:01:55s. Time for last 1,000,000: 4s. Last read position: chr1:88,180,351 INFO 2023-02-02 13:07:04 SinglePassSamProgram Processed 24,000,000 records. Elapsed time: 00:02:00s. Time for last 1,000,000: 4s. Last read position: chr1:92,181,993 INFO 2023-02-02 13:07:09 SinglePassSamProgram Processed 25,000,000 records. Elapsed time: 00:02:05s. Time for last 1,000,000: 4s. Last read position: chr1:96,170,475 INFO 2023-02-02 13:07:14 SinglePassSamProgram Processed 26,000,000 records. Elapsed time: 00:02:09s. Time for last 1,000,000: 4s. Last read position: chr1:100,366,319 INFO 2023-02-02 13:07:18 SinglePassSamProgram Processed 27,000,000 records. Elapsed time: 00:02:14s. Time for last 1,000,000: 4s. Last read position: chr1:104,482,942 INFO 2023-02-02 13:07:23 SinglePassSamProgram Processed 28,000,000 records. Elapsed time: 00:02:19s. Time for last 1,000,000: 4s. Last read position: chr1:108,742,994 INFO 2023-02-02 13:07:28 SinglePassSamProgram Processed 29,000,000 records. Elapsed time: 00:02:23s. Time for last 1,000,000: 4s. Last read position: chr1:112,497,950 INFO 2023-02-02 13:07:33 SinglePassSamProgram Processed 30,000,000 records. Elapsed time: 00:02:28s. Time for last 1,000,000: 4s. Last read position: chr1:116,441,009 INFO 2023-02-02 13:07:37 SinglePassSamProgram Processed 31,000,000 records. Elapsed time: 00:02:33s. Time for last 1,000,000: 4s. Last read position: chr1:120,502,290 INFO 2023-02-02 13:07:43 SinglePassSamProgram Processed 32,000,000 records. Elapsed time: 00:02:39s. Time for last 1,000,000: 5s. Last read position: chr1:123,027,314 INFO 2023-02-02 13:07:49 SinglePassSamProgram Processed 33,000,000 records. Elapsed time: 00:02:45s. Time for last 1,000,000: 5s. Last read position: chr1:124,048,686 INFO 2023-02-02 13:07:55 SinglePassSamProgram Processed 34,000,000 records. Elapsed time: 00:02:51s. Time for last 1,000,000: 5s. Last read position: chr1:124,736,341 INFO 2023-02-02 13:08:01 SinglePassSamProgram Processed 35,000,000 records. Elapsed time: 00:02:57s. Time for last 1,000,000: 6s. Last read position: chr1:143,192,729 INFO 2023-02-02 13:08:08 SinglePassSamProgram Processed 36,000,000 records. Elapsed time: 00:03:03s. Time for last 1,000,000: 6s. Last read position: chr1:143,252,697 INFO 2023-02-02 13:08:14 SinglePassSamProgram Processed 37,000,000 records. Elapsed time: 00:03:09s. Time for last 1,000,000: 6s. Last read position: chr1:146,059,513 INFO 2023-02-02 13:08:19 SinglePassSamProgram Processed 38,000,000 records. Elapsed time: 00:03:15s. Time for last 1,000,000: 5s. Last read position: chr1:149,868,216 INFO 2023-02-02 13:08:24 SinglePassSamProgram Processed 39,000,000 records. Elapsed time: 00:03:20s. Time for last 1,000,000: 4s. Last read position: chr1:153,658,464 INFO 2023-02-02 13:08:29 SinglePassSamProgram Processed 40,000,000 records. Elapsed time: 00:03:24s. Time for last 1,000,000: 4s. Last read position: chr1:157,187,828 INFO 2023-02-02 13:08:34 SinglePassSamProgram Processed 41,000,000 records. Elapsed time: 00:03:29s. Time for last 1,000,000: 4s. Last read position: chr1:161,081,746 INFO 2023-02-02 13:08:38 SinglePassSamProgram Processed 42,000,000 records. Elapsed time: 00:03:34s. Time for last 1,000,000: 4s. Last read position: chr1:165,016,385 INFO 2023-02-02 13:08:43 SinglePassSamProgram Processed 43,000,000 records. Elapsed time: 00:03:39s. Time for last 1,000,000: 4s. Last read position: chr1:168,968,109 INFO 2023-02-02 13:08:48 SinglePassSamProgram Processed 44,000,000 records. Elapsed time: 00:03:43s. Time for last 1,000,000: 4s. Last read position: chr1:172,988,366 INFO 2023-02-02 13:08:52 SinglePassSamProgram Processed 45,000,000 records. Elapsed time: 00:03:48s. Time for last 1,000,000: 4s. Last read position: chr1:176,949,619 INFO 2023-02-02 13:08:57 SinglePassSamProgram Processed 46,000,000 records. Elapsed time: 00:03:53s. Time for last 1,000,000: 4s. Last read position: chr1:180,949,338 INFO 2023-02-02 13:09:02 SinglePassSamProgram Processed 47,000,000 records. Elapsed time: 00:03:57s. Time for last 1,000,000: 4s. Last read position: chr1:184,807,396 INFO 2023-02-02 13:09:06 SinglePassSamProgram Processed 48,000,000 records. Elapsed time: 00:04:02s. Time for last 1,000,000: 4s. Last read position: chr1:189,053,872 INFO 2023-02-02 13:09:11 SinglePassSamProgram Processed 49,000,000 records. Elapsed time: 00:04:07s. Time for last 1,000,000: 4s. Last read position: chr1:193,313,533 INFO 2023-02-02 13:09:16 SinglePassSamProgram Processed 50,000,000 records. Elapsed time: 00:04:11s. Time for last 1,000,000: 4s. Last read position: chr1:197,722,580 INFO 2023-02-02 13:09:20 SinglePassSamProgram Processed 51,000,000 records. Elapsed time: 00:04:16s. Time for last 1,000,000: 4s. Last read position: chr1:201,594,892 INFO 2023-02-02 13:09:25 SinglePassSamProgram Processed 52,000,000 records. Elapsed time: 00:04:21s. Time for last 1,000,000: 4s. Last read position: chr1:205,296,228 INFO 2023-02-02 13:09:30 SinglePassSamProgram Processed 53,000,000 records. Elapsed time: 00:04:26s. Time for last 1,000,000: 4s. Last read position: chr1:209,114,496 INFO 2023-02-02 13:09:35 SinglePassSamProgram Processed 54,000,000 records. Elapsed time: 00:04:30s. Time for last 1,000,000: 4s. Last read position: chr1:212,990,736 INFO 2023-02-02 13:09:39 SinglePassSamProgram Processed 55,000,000 records. Elapsed time: 00:04:35s. Time for last 1,000,000: 4s. Last read position: chr1:217,217,647 INFO 2023-02-02 13:09:44 SinglePassSamProgram Processed 56,000,000 records. Elapsed time: 00:04:40s. Time for last 1,000,000: 4s. Last read position: chr1:221,387,522 INFO 2023-02-02 13:09:49 SinglePassSamProgram Processed 57,000,000 records. Elapsed time: 00:04:45s. Time for last 1,000,000: 5s. Last read position: chr1:225,314,304 INFO 2023-02-02 13:09:54 SinglePassSamProgram Processed 58,000,000 records. Elapsed time: 00:04:49s. Time for last 1,000,000: 4s. Last read position: chr1:228,891,181 INFO 2023-02-02 13:09:58 SinglePassSamProgram Processed 59,000,000 records. Elapsed time: 00:04:54s. Time for last 1,000,000: 4s. Last read position: chr1:232,844,624 INFO 2023-02-02 13:10:03 SinglePassSamProgram Processed 60,000,000 records. Elapsed time: 00:04:59s. Time for last 1,000,000: 4s. Last read position: chr1:236,642,862 INFO 2023-02-02 13:10:08 SinglePassSamProgram Processed 61,000,000 records. Elapsed time: 00:05:03s. Time for last 1,000,000: 4s. Last read position: chr1:241,003,784 INFO 2023-02-02 13:10:13 SinglePassSamProgram Processed 62,000,000 records. Elapsed time: 00:05:08s. Time for last 1,000,000: 5s. Last read position: chr1:245,019,923 ERROR 2023-02-02 13:10:18 Slice Reference MD5 mismatch for slice 0:248741017-248780008, GATGGGAGAT...AGGGAAATAT [Thu Feb 02 13:10:18 CST 2023] picard.analysis.CollectInsertSizeMetrics done. Elapsed time: 5.23 minutes. Runtime.totalMemory()=5051514880 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.cram.CRAMException: Reference sequence MD5 mismatch for slice: sequence id 0, start 248741017, span 38992, expected MD5 37a71701c8b7513578af280cdbcc4bda at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:187) at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:261) at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:592) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:129) at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:77) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
The previous error was for another sample too. The was the only one I had recorded. That's why the chr position are not exactly the same
@NeginValizadegan For some reason you appear to be running very old Picard versions. This most recent error message you posted is coming from an even older version of Picard (2.9.0-1-gf5b9f50-SNAPSHOT
) than the first one you posted (which you said was 2.10.1
). I would suggest upgrading to 2.27.5.
hmm strange. How do you know which version it is? It might be helpful for me to know that for troubleshooting.
oh right I just saw it. Strange. I will follow up with our IT to see why this discrepancy exist.
Okay apparently there is not a version number in the picard.jar file downloaded from https://github.com/broadinstitute/picard/releases and thus if easybuild detects a cache copy, it will use that instead of downloading it which was an older version. They forced it to download and now version is correct. I will re-try and see if the error persists. Thanks for catching that.
That seem to have helped. Thanks so much!
👍
This CRAM file and its reference are identical to those I have in the cloud because I uploaded these exact files to the cloud after downloading them from the 1000 Genomes Project FTP site. Back then, in the cloud, I was able to use samtools to decram and index this CRAM file alongside 39 others.
On our local server, I cannot get readwalkers PrintReads nor CalculateTargetCoverage to correctly decipher the CRAM. Both tools give the same error.
Here is the PrintReads command:
And here is the error:
This error occurs fairly soon after launching the job, after the progress meter shows the tool iterating over chromosome 1.