WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
234 stars 359 forks source link

Change in CHROM, INFO and FORMAT fields #95

Open moneterg opened 4 years ago

moneterg commented 4 years ago

Hi,

I believe this is a bug, but I'm not sure. Also, I'll appreciate some help.

I ran annovar with this command below (I used a lot of databases):

Annovar command

table_annovar.pl --verbose $input $dbPath \
--buildver hg19 --outfile myanno --remove \
--protocol refGene,cytoBand,phastCons46way,tfbsConsSites,wgRna,targetScanS,genomicSuperDups,dgvMerged,gwasCatalog,wgEncodeBroadHistoneGm12878H3k4me1StdPk,wgEncodeBroadHistoneGm12878H3k27acStdPk,wgEncodeBroadHistoneGm12878H3k4me3StdPk,wgEncodeBroadHistoneGm12878H3k9acStdPk,wgEncodeBroadHistoneGm12878H3k36me3StdPk,wgEncodeBroadHistoneGm12878H3k79me2StdPk,wgEncodeBroadHistoneGm12878H3k27me3StdPkV2,wgEncodeBroadHistoneGm12878H3k9me3StdPk,wgEncodeUwDnaseGm12878HotspotsRep1,wgEncodeBroadHistoneGm12878CtcfStdPk,dbnsfp35a,dbscsnv11,intervar_20180118,cg69,esp6500siv2_all,esp6500siv2_ea,esp6500siv2_aa,exac03,gnomad211_exome,gnomad211_genome,kaviar_20150923,hrcr1,abraom,1000g2015aug_all,gme,mcap13,revel,avsnp150,snp151,clinvar_20190305,popfreq_max_20150413,popfreq_all_20150413,mitimpact24,regsnpintron,gwava \
--operation g,r,r,r,r,r,r,r,r,r,r,r,r,r,r,r,r,r,r,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f \
--nastring . --vcfinput --polish --thread 10 --maxgenethread 10;

My input was a multi-sample VCF. I put here an example of a variant record abbreviated (some INFO fields removed).

input.vcf:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO      FORMAT      Sample1
chr2    28816821    .   T   TTTTG   261.61  PASS    AC=1;AF=0.005495;AN=182;(...);VariantType=INSERTION.NOVEL_4;culprit=FS  GT:AD:DP:FT:GQ:MLPSAC:MLPSAF:MQ0:PGT:PID:PL:PP  0/0:16,0:16:PASS:24:0:0:0:.:.:0,24,360:0,24,360

After annovar ran, I had a problem with this variant. So, I put here the variant record abbreviated.

myanno.vcf (with problem in bold text):

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO      FORMAT      Sample1
28816821 .   T   TTTTG   261.61  PASS    AC=1;AF=0.005495;AN=182;(...);VariantType=INSERTION.NOVEL_4;culprit=FS  GT:AD:DP:FT:GQ:MLPSAC:MLPSAF:MQ0:PGT:PID:PL:PP;ANNOVAR_DATE=2018-04-16;Func.refGene=intronic;(...);ALLELE_END    0/0:16,0:16:PASS:24:0:0:0:.:.:0,24,360:0,24,360

It seems to me that annovar messed up with some lines from my VCF output. So, some problems which I noticed were:

  1. The column CHROM disappeared
  2. On the INFO field, after "culprit" tag, a TAB was inserted
  3. FORMAT field disappeared
  4. FORMAT field was combined to annovar annotation field

If you want to see the variant record completely (too large), please, see below.

input.vcf

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO      FORMAT      Sample1
chr2    28816821    .   T   TTTTG   261.61  PASS    AC=1;AF=0.005495;AN=182;BaseQRankSum=-0.571;ClippingRankSum=0;DP=1421;ExcessHet=3.0103;FS=0;GC=34.65;GQ_MEAN=22.78;GQ_STDDEV=13.96;HRun=0;HW=0;InbreedingCoeff=-0.0237;LowMQ=0,0,1462;MLEAC=1;MLEAF=0.005495;MQ=61.34;MQRankSum=1.11;NCC=0;NDA=1;PG=0,0,0;PercentNBase=0;QD=17.44;ReadPosRankSum=0.057;SOR=0.223;Samples=SRR407224;VQSLOD=2.45;VariantType=INSERTION.NOVEL_4;culprit=FS GT:AD:DP:FT:GQ:MLPSAC:MLPSAF:MQ0:PGT:PID:PL:PP  0/0:16,0:16:PASS:24:0:0:0:.:.:0,24,360:0,24,360

myanno.vcf

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO      FORMAT      Sample1
28816821    .   T   TTTTG   261.61  PASS    AC=1;AF=0.005495;AN=182;BaseQRankSum=-0.571;ClippingRankSum=0;DP=1421;ExcessHet=3.0103;FS=0;GC=34.65;GQ_MEAN=22.78;GQ_STDDEV=13.96;HRun=0;HW=0;InbreedingCoeff=-0.0237;LowMQ=0,0,1462;MLEAC=1;MLEAF=0.005495;MQ=61.34;MQRankSum=1.11;NCC=0;NDA=1;PG=0,0,0;PercentNBase=0;QD=17.44;ReadPosRankSum=0.057;SOR=0.223;Samples=SRR407224;VQSLOD=2.45;VariantType=INSERTION.NOVEL_4;culprit=FS GT:AD:DP:FT:GQ:MLPSAC:MLPSAF:MQ0:PGT:PID:PL:PP;ANNOVAR_DATE=2018-04-16;Func.refGene=intronic;Gene.refGene=PLB1;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;cytoBand=2p23.2;phastCons46way=Name\x3dchr2.1168883;tfbsConsSites=.;wgRna=.;targetScanS=.;genomicSuperDups=.;dgvMerged=.;gwasCatalog=.;wgEncodeBroadHistoneGm12878H3k4me1StdPk=.;wgEncodeBroadHistoneGm12878H3k27acStdPk=.;wgEncodeBroadHistoneGm12878H3k4me3StdPk=.;wgEncodeBroadHistoneGm12878H3k9acStdPk=.;wgEncodeBroadHistoneGm12878H3k36me3StdPk=.;wgEncodeBroadHistoneGm12878H3k79me2StdPk=.;wgEncodeBroadHistoneGm12878H3k27me3StdPkV2=Name\x3d.;wgEncodeBroadHistoneGm12878H3k9me3StdPk=.;wgEncodeUwDnaseGm12878HotspotsRep1=.;wgEncodeBroadHistoneGm12878CtcfStdPk=.;SIFT_score=.;SIFT_converted_rankscore=.;SIFT_pred=.;Polyphen2_HDIV_score=.;Polyphen2_HDIV_rankscore=.;Polyphen2_HDIV_pred=.;Polyphen2_HVAR_score=.;Polyphen2_HVAR_rankscore=.;Polyphen2_HVAR_pred=.;LRT_score=.;LRT_converted_rankscore=.;LRT_pred=.;MutationTaster_score=.;MutationTaster_converted_rankscore=.;MutationTaster_pred=.;MutationAssessor_score=.;MutationAssessor_score_rankscore=.;MutationAssessor_pred=.;FATHMM_score=.;FATHMM_converted_rankscore=.;FATHMM_pred=.;PROVEAN_score=.;PROVEAN_converted_rankscore=.;PROVEAN_pred=.;VEST3_score=.;VEST3_rankscore=.;MetaSVM_score=.;MetaSVM_rankscore=.;MetaSVM_pred=.;MetaLR_score=.;MetaLR_rankscore=.;MetaLR_pred=.;M-CAP_score=.;M-CAP_rankscore=.;M-CAP_pred=.;REVEL_score=.;REVEL_rankscore=.;MutPred_score=.;MutPred_rankscore=.;CADD_raw=.;CADD_raw_rankscore=.;CADD_phred=.;DANN_score=.;DANN_rankscore=.;fathmm-MKL_coding_score=.;fathmm-MKL_coding_rankscore=.;fathmm-MKL_coding_pred=.;Eigen_coding_or_noncoding=.;Eigen-raw=.;Eigen-PC-raw=.;GenoCanyon_score=.;GenoCanyon_score_rankscore=.;integrated_fitCons_score=.;integrated_fitCons_score_rankscore=.;integrated_confidence_value=.;GERP++_RS=.;GERP++_RS_rankscore=.;phyloP100way_vertebrate=.;phyloP100way_vertebrate_rankscore=.;phyloP20way_mammalian=.;phyloP20way_mammalian_rankscore=.;phastCons100way_vertebrate=.;phastCons100way_vertebrate_rankscore=.;phastCons20way_mammalian=.;phastCons20way_mammalian_rankscore=.;SiPhy_29way_logOdds=.;SiPhy_29way_logOdds_rankscore=.;Interpro_domain=.;GTEx_V6p_gene=.;GTEx_V6p_tissue=.;dbscSNV_ADA_SCORE=.;dbscSNV_RF_SCORE=.;InterVar_automated=.;PVS1=.;PS1=.;PS2=.;PS3=.;PS4=.;PM1=.;PM2=.;PM3=.;PM4=.;PM5=.;PM6=.;PP1=.;PP2=.;PP3=.;PP4=.;PP5=.;BA1=.;BS1=.;BS2=.;BS3=.;BS4=.;BP1=.;BP2=.;BP3=.;BP4=.;BP5=.;BP6=.;BP7=.;cg69=.;esp6500siv2_all=.;esp6500siv2_ea=.;esp6500siv2_aa=.;ExAC_ALL=0.0004;ExAC_AFR=0.0065;ExAC_AMR=0.0002;ExAC_EAS=0;ExAC_FIN=0;ExAC_NFE=0;ExAC_OTH=0;ExAC_SAS=0;AF=0.0004;AF_popmax=0.0065;AF_male=0.0003;AF_female=0.0006;AF_raw=0.0004;AF_afr=0.0065;AF_sas=0;AF_amr=0.0003;AF_eas=0;AF_nfe=0;AF_fin=0;AF_asj=0;AF_oth=0;non_topmed_AF_popmax=0.0063;non_neuro_AF_popmax=0.0065;non_cancer_AF_popmax=0.0061;controls_AF_popmax=0.0043;AF=0.0018;AF_popmax=0.0064;AF_male=0.0017;AF_female=0.0018;AF_raw=0.0018;AF_afr=0.0064;AF_sas=.;AF_amr=0;AF_eas=0;AF_nfe=0;AF_fin=0;AF_asj=0;AF_oth=0;non_topmed_AF_popmax=0.0065;non_neuro_AF_popmax=0.0078;non_cancer_AF_popmax=.;controls_AF_popmax=0.0091;Kaviar_AF=5.82e-05;Kaviar_AC=9;Kaviar_AN=154602;HRC_AF=.;HRC_AC=.;HRC_AN=.;HRC_non1000G_AF=.;HRC_non1000G_AC=.;HRC_non1000G_AN=.;abraom_freq=0.000000;abraom_filter=VQSRTrancheINDEL99.00to99.90;abraom_cegh_filter=0.00179712;1000g2015aug_all=.;GME_AF=.;GME_NWA=.;GME_NEA=.;GME_AP=.;GME_Israel=.;GME_SD=.;GME_TP=.;GME_CA=.;MCAP13=.;REVEL=rs540553850;avsnp150=rs540553850;snp151=.;CLNALLELEID=.;CLNDN=.;CLNDISDB=.;CLNREVSTAT=.;CLNSIG=0.0068;PopFreqMax=0.0068;PopFreqMax=0.0018;1000G_ALL=0.0068;1000G_AFR=.;1000G_AMR=.;1000G_EAS=.;1000G_EUR=.;1000G_SAS=0.0004;ExAC_ALL=0.0065;ExAC_AFR=0.0002;ExAC_AMR=0.;ExAC_EAS=0.;ExAC_FIN=0.;ExAC_NFE=0.;ExAC_OTH=0.;ExAC_SAS=.;ESP6500siv2_ALL=.;ESP6500siv2_AA=.;ESP6500siv2_EA=.;CG46=.;MitImpact_id=.;Gene_symbol=.;OXPHOS_complex=.;Ensembl_gene_id=.;Ensembl_protein_id=.;Ensembl_transcript_id=.;Uniprot_name=.;Uniprot_id=.;Ncbi_gene_id=.;Ncbi_protein_id=.;Gene_position=.;AA_position=.;AA_ref=.;AA_alt=.;Codon_substitution=.;PhyloP_100V=.;PhastCons_100V=.;SiteVar=.;PolyPhen2=.;PolyPhen2_score=.;SIFT=.;SIFT_score=.;FatHmm_pred=.;FatHmm_score=.;FatHmmW=.;FatHmmW_score=.;PROVEAN=.;PROVEAN_score=.;MutationAssessor=.;MutationAssessor_score=.;EFIN_SP_score=.;EFIN_SP=.;EFIN_HD_score=.;EFIN_HD=.;CADD_score=.;CADD_phred=.;CADD=.;VEST_pvalue=.;VEST_FDR=.;PANTHER_score=.;PANTHER=.;PhD-SNP_score=.;PhD-SNP=.;SNAP_score=.;SNAP=.;Meta-SNP_score=.;Meta-SNP=.;Meta-SNP_RI=.;CAROL_score=.;CAROL=.;Condel_score=.;Condel=.;COVEC_WMV_score=.;COVEC_WMV=.;PolyPhen2_transf_score=.;PolyPhen2_transf=.;SIFT_transf_score=.;SIFT_transf=.;MutationAssessor_transf_score=.;MutationAssessor_transf=.;CHASM_pvalue=.;CHASM_FDR=.;MISTIC_coevo_sites=.;MISTIC_mean_MI_score=.;COSMIC_id=.;COSMIC_tumor_site=.;COSMIC_samples=.;COSMIC_frequency=.;dbSNP_id=.;Variant_status=.;Associated_disease=.;Variant_class=.;APOGEE_probN=.;APOGEE_probP=.;APOGEE=.;regsnp_fpr=.;regsnp_disease=.;regsnp_splicing_site=.;GWAVA_region_score=.;GWAVA_tss_score=.;GWAVA_unmatched_score=0.005495;ALLELE_END 0/0:16,0:16:PASS:24:0:0:0:.:.:0,24,360:0,24,360

Thank you for your time and patience. Again, I'll appreciate some help here.

Monete