broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

gatk GenotypeGVCFs error:A USER ERROR has occurred: Bad input: Presence of '-RAW_MQ' annotation is detected. #8574

Closed panpanchen123 closed 10 months ago

panpanchen123 commented 10 months ago

Instructions

gatk version 4.4.0.0

When I run gatk GenotypeGVCFs, it shows this error:

A USER ERROR has occurred: Bad input: Presence of '-RAW_MQ' annotation is detected. This GATK version expects key RAW_MQandDP with a tuple of sum of squared MQ values and total reads over variant genotypes as the value. This could indicate that the provided input was produced with an older version of GATK. Use the argument '--allow-old-rms-mapping-quality-annotation-data' to override and attempt the deprecated MQ calculation. There may be differences in how newer GATK versions calculate DP and MQ that may result in worse MQ results. Use at your own risk.


I use gatk 4.2 and gatk 4.4 to run gatk HaplotypeCaller,respectively. And then use gatk CombineGVCFs (4.4) to combine all "gvcf.gz" files.

Please tell me how to solve the above problem.

gokalpcelik commented 10 months ago

Can you check the headers of your gvcf inputs to see if any of them has this old tag?

panpanchen123 commented 10 months ago

Can you check the headers of your gvcf inputs to see if any of them has this old tag?

this is the tag for gatk4.4

fileformat=VCFv4.2

ALT=

FILTER=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be hetero

zygous and is not intended to describe called alleles">

FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing gr

oup">

FORMAT=

FORMAT=

FORMAT=

GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --emit-ref-confidence GVCF --output CMC_C_1.g.vcf --input CMC_C_1.sorted.markdup.addRG.bam --reference kxc_hic_final.fast

a --use-posteriors-to-calculate-qual false --dont-use-dragstr-priors false --use-new-qual-calculator true --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-hetero zygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samp les-if-no-call 0 --genotype-assignment-method USE_PLS_TO_ASSIGN --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --flow-likelihood-parallel-thre ads 0 --flow-likelihood-optimized-comp false --flow-use-t0-tag false --flow-probability-threshold 0.003 --flow-remove-non-single-base-pair-indels false --flow-remove-one-zero-probs false - -flow-quantization-bins 121 --flow-fill-empty-bins-value 0.001 --flow-symmetric-indel-probs false --flow-report-insertion-or-deletion false --flow-disallow-probs-larger-than-call false --f low-lump-probs false --flow-retain-max-n-probs-base-format false --flow-probability-scaling-factor 10 --flow-order-cycle-length 4 --flow-number-of-uncertain-flows-to-clip 0 --flow-nucleoti de-of-first-uncertain-flow T --keep-boundary-flows false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvc f-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvc f-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --g vcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 - -gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --floor-blocks false --indel-size-to-eliminate-in-re f-model 10 --disable-optimizations false --dragen-mode false --flow-mode NONE --apply-bqd false --apply-frd false --disable-spanning-event-genotyping false --transform-dragen-mapping-quali ty false --mapping-quality-threshold-for-genotyping 20 --max-effective-depth-adjustment-for-frd 0 --just-determine-active-regions false --dont-genotype false --do-not-run-physical-phasing false --do-not-correct-overlapping-quality false --use-filtered-reads-for-annotations false --use-flow-aligner-for-stepwise-hc-filtering false --adaptive-pruning false --do-not-recover-dan gling-branches false --recover-dangling-heads false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-thresh old 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --enable-le gacy-graph-cycle-detection false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --num-matching-bases-in-dangling-end-to-recover -1 --error- correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --ba se-quality-score-threshold 18 --dragstr-het-hom-ratio 2 --dont-use-dragstr-pair-hmm-scores false --pair-hmm-gap-continuation-penalty 10 --expected-mismatch-rate-for-read-disqualification 0 .02 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --disable-symmetric-hmm-normalizing false --disable-cap-base-qu alities-to-map-quality false --enable-dynamic-read-disqualification-for-genotyping false --dynamic-read-disqualification-threshold 1.0 --native-pair-hmm-threads 4 --native-pair-hmm-use-dou ble-precision false --flow-hmm-engine-min-indel-adjust 6 --flow-hmm-engine-flat-insertion-penatly 45 --flow-hmm-engine-flat-deletion-penatly 45 --pileup-detection false --pileup-detection- enable-indel-pileup-calling false --num-artificial-haplotypes-to-add-per-allele 5 --artifical-haplotype-filtering-kmer-size 10 --pileup-detection-snp-alt-threshold 0.1 --pileup-detection-i ndel-alt-threshold 0.5 --pileup-detection-absolute-alt-depth 0.0 --pileup-detection-snp-adjacent-to-assembled-indel-range 5 --pileup-detection-bad-read-tolerance 0.0 --pileup-detection-pro per-pair-read-badness true --pileup-detection-edit-distance-read-badness-threshold 0.08 --pileup-detection-chimeric-read-badness true --pileup-detection-template-mean-badness-threshold 0.0 --pileup-detection-template-std-badness-threshold 0.0 --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --override-fragment-softclip-check false --min-base-quality-s core 10 --smith-waterman JAVA --max-mnp-distance 0 --force-call-filtered-alleles false --reference-model-deletion-quality 30 --soft-clip-low-quality-ends false --allele-informative-reads-o verlap-margin 2 --smith-waterman-dangling-end-match-value 25 --smith-waterman-dangling-end-mismatch-penalty -50 --smith-waterman-dangling-end-gap-open-penalty -110 --smith-waterman-danglin g-end-gap-extend-penalty -6 --smith-waterman-haplotype-to-reference-match-value 200 --smith-waterman-haplotype-to-reference-mismatch-penalty -150 --smith-waterman-haplotype-to-reference-ga p-open-penalty -260 --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 --smith-waterman-read-to-haplotype-match-value 10 --smith-waterman-read-to-haplotype-mismatch-penalty -15 --smith-waterman-read-to-haplotype-gap-open-penalty -30 --smith-waterman-read-to-haplotype-gap-extend-penalty -5 --flow-assembly-collapse-hmer-size 0 --flow-assembly-collapse-partial-mode false --flow-filter-alleles false --flow-filter-alleles-qual-threshold 30.0 --flow-filter-alleles-sor-threshold 3.0 --flow-filter-lone-alleles false --flow-filter-alleles-debug-graphs fal se --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 - -padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-extension-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assemb ly-region-trimming false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-pro gress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md 5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --dis able-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotati ons false --allow-old-rms-mapping-quality-annotation-data false",Version="4.4.0.0",Date="2023?8?21? CST ??5:33:54">

GVCFBlock0-1=minGQ=0(inclusive),maxGQ=1(exclusive)

GVCFBlock1-2=minGQ=1(inclusive),maxGQ=2(exclusive)

GVCFBlock10-11=minGQ=10(inclusive),maxGQ=11(exclusive)

GVCFBlock11-12=minGQ=11(inclusive),maxGQ=12(exclusive)

GVCFBlock12-13=minGQ=12(inclusive),maxGQ=13(exclusive)

GVCFBlock13-14=minGQ=13(inclusive),maxGQ=14(exclusive)

GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)

GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)

GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)

GVCFBlock17-18=minGQ=17(inclusive),maxGQ=18(exclusive)

GVCFBlock18-19=minGQ=18(inclusive),maxGQ=19(exclusive)

GVCFBlock19-20=minGQ=19(inclusive),maxGQ=20(exclusive)

GVCFBlock2-3=minGQ=2(inclusive),maxGQ=3(exclusive)

GVCFBlock20-21=minGQ=20(inclusive),maxGQ=21(exclusive)

GVCFBlock21-22=minGQ=21(inclusive),maxGQ=22(exclusive)

GVCFBlock22-23=minGQ=22(inclusive),maxGQ=23(exclusive)

GVCFBlock23-24=minGQ=23(inclusive),maxGQ=24(exclusive)

GVCFBlock24-25=minGQ=24(inclusive),maxGQ=25(exclusive)

GVCFBlock25-26=minGQ=25(inclusive),maxGQ=26(exclusive)

GVCFBlock26-27=minGQ=26(inclusive),maxGQ=27(exclusive)

GVCFBlock27-28=minGQ=27(inclusive),maxGQ=28(exclusive)

GVCFBlock28-29=minGQ=28(inclusive),maxGQ=29(exclusive)

GVCFBlock29-30=minGQ=29(inclusive),maxGQ=30(exclusive)

GVCFBlock3-4=minGQ=3(inclusive),maxGQ=4(exclusive)

GVCFBlock30-31=minGQ=30(inclusive),maxGQ=31(exclusive)

GVCFBlock31-32=minGQ=31(inclusive),maxGQ=32(exclusive)

GVCFBlock32-33=minGQ=32(inclusive),maxGQ=33(exclusive)

GVCFBlock33-34=minGQ=33(inclusive),maxGQ=34(exclusive)

GVCFBlock34-35=minGQ=34(inclusive),maxGQ=35(exclusive)

GVCFBlock35-36=minGQ=35(inclusive),maxGQ=36(exclusive)

GVCFBlock36-37=minGQ=36(inclusive),maxGQ=37(exclusive)

GVCFBlock37-38=minGQ=37(inclusive),maxGQ=38(exclusive)

GVCFBlock38-39=minGQ=38(inclusive),maxGQ=39(exclusive)

GVCFBlock39-40=minGQ=39(inclusive),maxGQ=40(exclusive)

GVCFBlock4-5=minGQ=4(inclusive),maxGQ=5(exclusive)

GVCFBlock40-41=minGQ=40(inclusive),maxGQ=41(exclusive)

GVCFBlock41-42=minGQ=41(inclusive),maxGQ=42(exclusive)

GVCFBlock42-43=minGQ=42(inclusive),maxGQ=43(exclusive)

GVCFBlock43-44=minGQ=43(inclusive),maxGQ=44(exclusive)

GVCFBlock44-45=minGQ=44(inclusive),maxGQ=45(exclusive)

GVCFBlock45-46=minGQ=45(inclusive),maxGQ=46(exclusive)

GVCFBlock46-47=minGQ=46(inclusive),maxGQ=47(exclusive)

GVCFBlock47-48=minGQ=47(inclusive),maxGQ=48(exclusive)

GVCFBlock48-49=minGQ=48(inclusive),maxGQ=49(exclusive)

GVCFBlock49-50=minGQ=49(inclusive),maxGQ=50(exclusive)

GVCFBlock5-6=minGQ=5(inclusive),maxGQ=6(exclusive)

GVCFBlock50-51=minGQ=50(inclusive),maxGQ=51(exclusive)

GVCFBlock51-52=minGQ=51(inclusive),maxGQ=52(exclusive)

GVCFBlock52-53=minGQ=52(inclusive),maxGQ=53(exclusive)

GVCFBlock53-54=minGQ=53(inclusive),maxGQ=54(exclusive)

GVCFBlock54-55=minGQ=54(inclusive),maxGQ=55(exclusive)

GVCFBlock55-56=minGQ=55(inclusive),maxGQ=56(exclusive)

GVCFBlock56-57=minGQ=56(inclusive),maxGQ=57(exclusive)

GVCFBlock57-58=minGQ=57(inclusive),maxGQ=58(exclusive)

GVCFBlock58-59=minGQ=58(inclusive),maxGQ=59(exclusive)

GVCFBlock59-60=minGQ=59(inclusive),maxGQ=60(exclusive)

GVCFBlock6-7=minGQ=6(inclusive),maxGQ=7(exclusive)

GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)

GVCFBlock7-8=minGQ=7(inclusive),maxGQ=8(exclusive)

GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)

GVCFBlock8-9=minGQ=8(inclusive),maxGQ=9(exclusive)

GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)

GVCFBlock9-10=minGQ=9(inclusive),maxGQ=10(exclusive)

GVCFBlock90-99=minGQ=90(inclusive),maxGQ=99(exclusive)

GVCFBlock99-100=minGQ=99(inclusive),maxGQ=100(exclusive)

INFO=

INFO=

INFO=

INFO=

INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectatio

n">

INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order

as listed">

INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order

as listed">

INFO=

INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ for

mulation.">

INFO=

and this is the tag for gatk 4.2

fileformat=VCFv4.2

ALT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing gr

oup">

FORMAT=

FORMAT=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectatio

n">

INFO=

INFO=

INFO=

INFO=

INFO=

SentieonCommandLine.Haplotyper=<ID=Haplotyper,Version="sentieon-genomics-202112.06",Date="2023-10-16T04:14:04Z",CommandLine="/WORK/mge_test_4/CHSNP/app/sentieon-genomics-202112.06/libexe

c/driver -r /WORK2/mge_test_4/CHSNP/PRECIPATH001/genome/kxc_hic_final.fasta -t 24 -i ZD_C_3.deduped.bam --algo Haplotyper --emit_conf=30 --call_conf=30 --emit_mode gvcf ZD_C_3-output-hc.gv cf.vcf.gz">

gokalpcelik commented 10 months ago

This is a gvcf created by sentieon not GATK HaplotypeCaller as it can be seen in the header. You need to use that parameter to allow old style headers.

panpanchen123 commented 10 months ago

This is a gvcf created by sentieon not GATK HaplotypeCaller as it can be seen in the header. You need to use that parameter to allow old style headers.

you mean the argument --allow-old-rms-mapping-quality-annotation-data right?

gokalpcelik commented 10 months ago

Yes that's the one.

panpanchen123 commented 10 months ago

Yes that's the one.

ok, thank you very much.