broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

GATK4.1.9 Mutect2 Header Issue Leads to 'Invalid' VCF #6931

Open pbrennan13 opened 4 years ago

pbrennan13 commented 4 years ago

Instructions

The github issue tracker is for bug reports, feature requests, and API documentation requests. General questions about how to use the GATK, how to interpret the output, etc. should be asked on the official support forum.


Bug Report

Affected tool(s) or class(es)

Mutect2/FilterMutectCalls

Affected version(s)

Both 4.1.6 and 4.1.9 were affected. Other version may be affected as well, but I have not tested them.

Description

Output from vcf-validator:

   7       ..      INFO field at chr1:160882084 .. INFO tag [AS_SB_TABLE=719,346|0,47] expected different number of values (1),INFO tag [AS_FilterStatus=weak_evidence,base_qual,strand_bias] expected different number of values (expected 1, found 3)
    7       ..      INFO field at chr1:230995820 .. INFO tag [AS_SB_TABLE=444,391|4,6|5,6] expected different number of values (1),INFO tag [AS_FilterStatus=weak_evidence|weak_evidence] expected different number of values (expected 2, found 1)
    6       ..      INFO field at chr2:169905124 .. INFO tag [AS_SB_TABLE=387,312|2,2] expected different number of values (1),INFO tag [AS_FilterStatus=weak_evidence,base_qual] expected different number of values (expected 1, found 2)
    6       ..      INFO field at chr3:42210085 .. INFO tag [AS_SB_TABLE=15,24|3,2|206,188|174,140|3,1] expected different number of values (1),INFO tag [AS_FilterStatus=weak_evidence|SITE|SITE|weak_evidence] expected different number of values (expected 4, found 1)
    5       ..      INFO field at chr1:82186950 .. INFO tag [AS_SB_TABLE=1,5|2,2|11,22|53,59|30,35|12,10] expected different number of values (1),INFO tag [AS_FilterStatus=weak_evidence|SITE|SITE|SITE|SITE] expected different number of values (expected 5, found 1)
    5       ..      INFO field at chr3:38585868 .. INFO tag [AS_FilterStatus=weak_evidence,position] expected different number of values (expected 1, found 2),INFO tag [AS_SB_TABLE=334,422|0,7] expected different number of values (1)
    5       ..      INFO field at chr3:67380165 .. INFO tag [AS_SB_TABLE=57,75|4,4|27,28|74,69|13,19|5,4] expected different number of values (1),INFO tag [AS_FilterStatus=weak_evidence|SITE|SITE|SITE|weak_evidence] expected different number of values (expected 5, found 1)
    5       ..      INFO field at chr8:6312001 .. INFO tag [AS_FilterStatus=SITE|SITE|SITE|SITE|SITE] expected different number of values (expected 5, found 1),INFO tag [AS_SB_TABLE=36,22|10,5|22,16|46,32|27,19|13,5] expected different number of values (1)
    4       ..      INFO field at chr1:20868110 .. INFO tag [AS_FilterStatus=weak_evidence,base_qual] expected different number of values (expected 1, found 2),INFO tag [AS_SB_TABLE=73,75|26,0] expected different number of values (1)
    4       ..      INFO field at chr1:79524210 .. INFO tag [AS_FilterStatus=weak_evidence|SITE|SITE|weak_evidence] expected different number of values (expected 4, found 1),INFO tag [AS_SB_TABLE=122,108|4,8|35,36|22,12|2,4] expected different number of values (1)
    4       ..      INFO field at chr2:227294944 .. INFO tag [AS_FilterStatus=weak_evidence|SITE|SITE|SITE|weak_evidence] expected different number of values (expected 5, found 1),INFO tag [AS_SB_TABLE=101,135|4,6|30,26|136,173|18,35|3,4] expected different number of values (1)
    4       ..      INFO field at chr3:149589646 .. INFO tag [AS_FilterStatus=weak_evidence|SITE|weak_evidence] expected different number of values (expected 3, found 1),INFO tag [AS_SB_TABLE=97,103|9,14|12,9|4,3] expected different number of values (1)

We have idnetified the following header lines to be causing the issue

ID=AS_ReadPosRankSum,Number=A ID=AS_FilterStatus,Number=A ID=AS_MQ,Number=A ID=AS_SB_TABLE,Number=1 ID=AS_UNIQ_ALT_READ_COUNT,Number=A

After our final VCF is produced from FilterMutectCalls, we are having to manually changed all 'A' and '1' -> '.' in the above annotations. This seems to resolve the issue.

Below is the mutect2 command we are using:

gatk Mutect2 \ -I tumor.bam \ -I normal.bam \ -normal normal_sample_name \ -mbq 17 \ --initial-tumor-lod 6.0 \ -A AS_RMSMappingQuality -A MappingQualityRankSumTest -A AS_ReadPosRankSumTest -A FragmentLength \ --germline-resource af-only-gnomad.raw.sites.b37.vcf.gz \ -O mutect2.vcf \ -R human_g1k_v37_decoy.fasta

Any help in this matter would be greatly appreciated!

davidbenjamin commented 4 years ago

@pbrennan13 after #6858 goes in I can apply a similar fix to the other annotations. Are you sure that AS_MQ and AS_ReadPosRankSum are problematic? They seem to be correctly described as length-A lists of floats.

berguner commented 4 years ago

We also experienced problems with the wrong number attribute for the AS_SB_TABLE and AS_FilterStatus annotations.

Besides the number attribute, AS_SB_TABLE has a non-standard field separation; it uses | to separate alleles and comma to separate values for each allele. AFAIK, the common practice is using the comma as the primary delimiter to separate alleles and other delimiters would be used within each allele specific annotation to further separate the values.

TnakaNY commented 3 years ago

Hi pbrennan13 and all, I have same error in merging Mutect2.vcf files to one by using bcftools merge function.

I am beginner for editing vcf files.

How did you change all 'A' and '1' -> '.' of Mutect2 vcf files? Could you show me or write brief command here?

It would be great help for those who are beginner.

BJWiley233 commented 3 years ago

@TnakaNY may be not the best way but I gunzipped all the files, used sed -i 's/##INFO=<ID=AS_FilterStatus,Number=A/##INFO=<ID=AS_FilterStatus,Number=1/', then bgzipped and indexed again.

TnakaNY commented 3 years ago

Hi, Thank you for suggestions. Please let me try it. Best, AT

2021年2月1日(月) 18:40 BJWiley23 notifications@github.com:

@TnakaNY https://github.com/TnakaNY may be not the best way but I gunzipped all the files, used sed -i 's/##INFO=<ID=AS_FilterStatus,Number=A/##INFO=<ID=AS_FilterStatus,Number=1/' , then bgzipped and indexed again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/6931#issuecomment-771237955, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJT7KULSRXWU55LLWHJUFQTS443VJANCNFSM4TE6DOYA .