filter_vep unable to find CSQ info field #431

matmu commented 5 years ago

I have successfully annotated my vcf file using a local installation of variant effect predictor (VEP). However, the filtering doesn't work. E.g. the command

filter_vep -i vep.test.vcf.gz -filter "Existing_variation" -o out.txt --force_overwrite

to keep all variants with a defined field "Existing_variation", doesn't return any variant although there are multiple such variants present in the dataset.

When listing the available fields with

filter_vep -i vep.test.vcf.gz --list --vcf_info_field CSQ

it doesn't return the fields stated in the CSQ info field (Starting with "##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT"), instead the other INFO fields in the VCF are listed:

Available fields:

ens-lgil commented 5 years ago

Dear @matmu,

Can you send use the header and a couple of lines of your VCF file, please ?

I just tried with the following VCF file:

##VEP="v96" time="2019-04-04 08:04:16" cache="/opt/vep/.vep/homo_sapiens/94_GRCh38" ensembl-funcgen=96.9c3a0cd ensembl=96.7a35428 ensembl-io=96.6e65b30 ensembl-variation=96.db44614 1000genomes="phase3" COSMIC="86" ClinVar="201807" ESP="V2-SSA137" HGMD-PUBLIC="20174" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 29" genebuild="2014-07" gnomAD="170228" polyphen="2.2.2" regbuild="16" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID">
1   1748780 rs1014988   G   A   .   .   CSQ=A|upstream_gene_variant|MODIFIER|SLC35E2A|ENSG00000215790|Transcript|ENST00000246421|processed_transcript|||||||||||2788|-1||HGNC|HGNC:20863,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000341426|protein_coding|||||||||||2452|-1||HGNC|HGNC:29831,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000341991|protein_coding|||||||||||2452|-1||HGNC|HGNC:29831,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000342348|protein_coding|||||||||||3691|-1||HGNC|HGNC:29831,A|upstream_gene_variant|MODIFIER|SLC35E2A|ENSG00000215790|Transcript|ENST00000355439|protein_coding|||||||||||2781|-1||HGNC|HGNC:20863,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000378625|protein_coding|||||||||||2452|-1||HGNC|HGNC:29831,A|downstream_gene_variant|MODIFIER|NADK|ENSG00000008130|Transcript|ENST00000498806|nonsense_mediated_decay|||||||||||4195|-1|cds_start_NF|HGNC|HGNC:29831,A|upstream_gene_variant|MODIFIER|SLC35E2A|ENSG00000215790|Transcript|ENST00000643905|protein_coding|||||||||||2781|-1||HGNC|HGNC:20863,A|upstream_gene_variant|MODIFIER|SLC35E2A|ENSG00000215790|Transcript|ENST00000647043|processed_transcript|||||||||||2861|-1||HGNC|HGNC:20863   GT  1|0
1   2401592 rs3001336   G   A   .   .   CSQ=A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000288774|protein_coding|||||||||||2372|-1||HGNC|HGNC:8851,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000306256|protein_coding||6/7||||||||||1|cds_end_NF|HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000378512|protein_coding||5/6||||||||||1||HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000378513|protein_coding||4/5||||||||||1||HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000378518|protein_coding||4/4||||||||||1||HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000443438|protein_coding||5/5||||||||||1|cds_end_NF|HGNC|HGNC:30309,A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000447513|protein_coding|||||||||||3227|-1||HGNC|HGNC:8851,A|upstream_gene_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000462129|retained_intron|||||||||||359|1||HGNC|HGNC:30309,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000488353|protein_coding||4/5||||||||||1||HGNC|HGNC:30309,A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000507596|protein_coding|||||||||||3246|-1||HGNC|HGNC:8851,A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000510434|nonsense_mediated_decay|||||||||||4926|-1||HGNC|HGNC:8851,A|intron_variant|MODIFIER|RER1|ENSG00000157916|Transcript|ENST00000605895|protein_coding||5/6||||||||||1||HGNC|HGNC:30309,A|downstream_gene_variant|MODIFIER|PEX10|ENSG00000157911|Transcript|ENST00000650293|protein_coding|||||||||||2469|-1|cds_start_NF|HGNC|HGNC:8851   GT  0|0

and I got the content of the CSQ field (and the other VCF fields) when I run the command:

filter_vep -i /opt/vep/.vep/output/test_output.vcf.gz --list --vcf_info_field CSQ


Best regards, Laurent

matmu commented 5 years ago

Dear @ens-lgil,

please find the vcf below. I have deleted some rows from the header, but it still behaves the same way.

Best regards, Matthias

##FILTER=<ID=PASS,Description="All filters passed">
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP Membership">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=NEGATIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the negative training set of bad variants">
##INFO=<ID=POSITIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the positive training set of good variants">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##INFO=<ID=VQSLOD,Number=1,Type=Float,Description="Log odds of being a true variant versus being false under the trained gaussian mixture model">
##INFO=<ID=culprit,Number=1,Type=String,Description="The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out">
##VEP="v95" time="2019-03-27 15:46:19" cache="/data/icg_munz/.vep/homo_sapiens_merged/95_GRCh38" ensembl-funcgen=95.94439f4 ensembl=95.4f83453 ensembl-io=95.78ccac5 ensembl-variation=95.858de3e 1000genomes="phase3" COSMIC="86" ClinVar="201810" ESP="V2-SSA137" HGMD-PUBLIC="20174" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 29" genebuild="2014-07" gnomAD="170228" polyphen="2.2.2" refseq="2018-07-10 14:50:52 - GCF_000001405.38_GRCh38.p12_genomic.gff" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|REFSEQ_MATCH|SOURCE|GIVEN_REF|USED_REF|BAM_EDIT|GENE_PHENO|NEAREST|SIFT|PolyPhen|DOMAINS|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|CADD_PHRED|CADD_RAW|#chr|...omes_controls_AFR_AN|gnomAD_exomes_controls_AFR_nhomalt|gnomAD_exomes_controls_AMR_AC|gnomAD_exomes_controls_AMR_AF|gnomAD_exomes_controls_AMR_AN|gnomAD_exomes_controls_AMR_nhomalt|gnomAD_exomes_controls_AN|gnomAD_exomes_controls_ASJ_AC|gnomAD_exomes_controls_ASJ_AF|gnomAD_exomes_controls_ASJ_AN|gnomAD_exomes_controls_ASJ_nhomalt|gnomAD_exomes_controls_EAS_AC|gnomAD_exomes_controls_EAS_AF|gnomAD_exomes_controls_EAS_AN|gnomAD_exomes_controls_EAS_nhomalt|gnomAD_exomes_controls_FIN_AC|gnomAD_exomes_controls_FIN_AF|gnomAD_exomes_controls_FIN_AN|gnomAD_exomes_controls_FIN_nhomalt|gnomAD_exomes_controls_NFE_AC|gnomAD_exomes_controls_NFE_AF|gnomAD_exomes_controls_NFE_AN|gnomAD_exomes_controls_NFE_nhomalt|gnomAD_exomes_controls_POPMAX_AC|gnomAD_exomes_controls_POPMAX_AF|gnomAD_exomes_controls_POPMAX_AN|gnomAD_exomes_controls_POPMAX_nhomalt|gnomAD_exomes_controls_SAS_AC|gnomAD_exomes_controls_SAS_AF|gnomAD_exomes_controls_SAS_AN|gnomAD_exomes_controls_SAS_nhomalt|gnomAD_exomes_controls_nhomalt|gnomAD_exomes_flag|gnomAD_exomes_nhomalt|gnomAD_genomes_AC|gnomAD_genomes_AF|gnomAD_genomes_AFR_AC|gnomAD_genomes_AFR_AF|gnomAD_genomes_AFR_AN|gnomAD_genomes_AFR_nhomalt|gnomAD_genomes_AMR_AC|gnomAD_genomes_AMR_AF|gnomAD_genomes_AMR_AN|gnomAD_genomes_AMR_nhomalt|gnomAD_genomes_AN|gnomAD_genomes_ASJ_AC|gnomAD_genomes_ASJ_AF|gnomAD_genomes_ASJ_AN|gnomAD_genomes_ASJ_nhomalt|gnomAD_genomes_EAS_AC|gnomAD_genomes_EAS_AF|gnomAD_genomes_EAS_AN|gnomAD_genomes_EAS_nhomalt|gnomAD_genomes_FIN_AC|gnomAD_genomes_FIN_AF|gnomAD_genomes_FIN_AN|gnomAD_genomes_FIN_nhomalt|gnomAD_genomes_NFE_AC|gnomAD_genomes_NFE_AF|gnomAD_genomes_NFE_AN|gnomAD_genomes_NFE_nhomalt|gnomAD_genomes_POPMAX_AC|gnomAD_genomes_POPMAX_AF|gnomAD_genomes_POPMAX_AN|gnomAD_genomes_POPMAX_nhomalt|gnomAD_genomes_controls_AC|gnomAD_genomes_controls_AF|gnomAD_genomes_controls_AFR_AC|gnomAD_genomes_controls_AFR_AF|gnomAD_genomes_controls_AFR_AN|gnomAD_genomes_controls_AFR_nhomalt|gnomAD_genomes_controls_AMR_AC|gnomAD_genomes_controls_AMR_AF|gnomAD_genomes_controls_AMR_AN|gnomAD_genomes_controls_AMR_nhomalt|gnomAD_genomes_controls_AN|gnomAD_genomes_controls_ASJ_AC|gnomAD_genomes_controls_ASJ_AF|gnomAD_genomes_controls_ASJ_AN|gnomAD_genomes_controls_ASJ_nhomalt|gnomAD_genomes_controls_EAS_AC|gnomAD_genomes_controls_EAS_AF|gnomAD_genomes_controls_EAS_AN|gnomAD_genomes_controls_EAS_nhomalt|gnomAD_genomes_controls_FIN_AC|gnomAD_genomes_controls_FIN_AF|gnomAD_genomes_controls_FIN_AN|gnomAD_genomes_controls_FIN_nhomalt|gnomAD_genomes_controls_NFE_AC|gnomAD_genomes_controls_NFE_AF|gnomAD_genomes_controls_NFE_AN|gnomAD_genomes_controls_NFE_nhomalt|gnomAD_genomes_controls_POPMAX_AC|gnomAD_genomes_controls_POPMAX_AF|gnomAD_genomes_controls_POPMAX_AN|gnomAD_genomes_controls_POPMAX_nhomalt|gnomAD_genomes_controls_nhomalt|gnomAD_genomes_flag|gnomAD_genomes_nhomalt|hg18_chr|hg18_pos(1-based)|hg19_chr|hg19_pos(1-based)|integrated_confidence_value|integrated_fitCons_rankscore|integrated_fitCons_score|phastCons100way_vertebrate|phastCons100way_vertebrate_rankscore|phastCons17way_primate|phastCons17way_primate_rankscore|phastCons30way_mammalian|phastCons30way_mammalian_rankscore|phyloP100way_vertebrate|phyloP100way_vertebrate_rankscore|phyloP17way_primate|phyloP17way_primate_rankscore|phyloP30way_mammalian|phyloP30way_mammalian_rankscore|pos(1-coor)|ref|refcodon|rs_dbSNP151">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  L-10    L-11    L-12    L-13    L-14    L-15    L-16    L-17    L-18    L-19    L-20    L-21
chr1    10347   .   AACCCT  A   151.66  PASS    AC=1;AF=0.042;AN=24;BaseQRankSum=1.96;ClippingRankSum=0;DP=764;ExcessHet=8.0341;FS=0;InbreedingCoeff=-0.2646;MLEAC=2;MLEAF=0.083;MQ=34.02;MQRankSum=0;QD=15.17;ReadPosRankSum=-0.319;SOR=1.609;VQSLOD=1.09;culprit=DP;CSQ=-|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene||||||||||rs1363828207|1658|1||deletion|HGNC|HGNC:37102||||||||||Ensembl|ACCCT|ACCCT|||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|upstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript||||||||||rs1363828207|1517|1||deletion|HGNC|HGNC:37102|YES|1||||||||Ensembl|ACCCT|ACCCT|||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||1||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene||||||||||rs1363828207|4052|-1||deletion|HGNC|HGNC:38034|YES|||||||||Ensembl|ACCCT|ACCCT|||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|downstream_gene_variant|MODIFIER|WASH7P|653635|Transcript|NR_024540.1|transcribed_pseudogene||||||||||rs1363828207|4010|-1||deletion|EntrezGene|HGNC:38034||||||||||RefSeq|ACCCT|ACCCT|OK||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|upstream_gene_variant|MODIFIER|DDX11L1|100287102|Transcript|NR_046018.2|transcribed_pseudogene||||||||||rs1363828207|1522|1||deletion|EntrezGene|HGNC:37102||||||||||RefSeq|ACCCT|ACCCT|||OR4F5||||||||||||||||||||||||||||||||5.752|0.340185|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||    GT:AD:DP:GQ:PL  0/0:54,8:62:0:0,0,1573  0/0:70,8:78:0:0,0,1948  0/0:50,6:56:0:0,0,1393  0/0:49,9:58:0:0,0,1390  0/0:60,9:69:0:0,0,1691  0/0:53,7:60:0:0,0,1578  0/0:55,9:64:0:0,0,1548  0/1:3,7:10:51:188,0,51  0/0:59,5:64:31:0,31,1827    0/0:70,7:77:17:0,17,2229    0/0:78,7:85:28:0,28,2329    0/0:69,10:79:0:0,0,1898
chr1    51972   rs546829777 GGAC    G   874.34  PASS    AC=4;AF=0.182;AN=22;DB;DP=187;ExcessHet=0.0164;FS=0;InbreedingCoeff=0.9066;MLEAC=4;MLEAF=0.182;MQ=34.36;QD=34.24;SOR=2.2;VQSLOD=2.71;culprit=DP;CSQ=-|upstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|Transcript|ENST00000606857|unprocessed_pseudogene||||||||||rs546829777|498|1||deletion|HGNC|HGNC:14822|YES|||||||||Ensembl|GAC|GAC|||OR4F5|||||0.0006|0|0|0|0.002|0.001||||||||||||0.002|EUR|||||||||1.836|0.009317|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||,-|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000344272|promoter_flanking_region||||||||||rs546829777||||deletion|||||||||||||||||OR4F5|||||0.0006|0|0|0|0.002|0.001||||||||||||0.002|EUR|||||||||1.836|0.009317||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GT:AD:DP:GQ:PL  ./.:0,0:0:.:0,0,0   0/0:1,0:1:3:0,3,13  1/1:0,13:13:39:585,39,0 0/0:11,0:11:33:0,33,395 1/1:0,8:8:24:360,24,0   0/0:13,0:13:39:0,39,448 0/0:18,0:18:54:0,54,616 0/0:39,0:39:99:0,117,1409   0/0:29,0:29:84:0,84,1260    0/0:12,0:12:36:0,36,433 0/0:24,0:24:72:0,72,852 0/0:19,0:19:57:0,57,668
ens-lgil commented 5 years ago

Dear @matmu,

I think I found the issue: there is a #chr field in your CSQ:


If works when you remove the # character from #chr in the CSQ field.

This comes from dbNSFP:

###chr=#chr from dbNSFP file

We need to investigate it and try to remove the strange characters such as # in the selected fields.

Best regards, Laurent

matmu commented 5 years ago

Super, thanks for your help. Yes, an error message would be also helpful. Do you, how filter_vep behaves, if there are duplicate fields? Will it then filter using the first occurence?

ens-lgil commented 5 years ago

Dear @matmu,

About the dbNSFP issue, we are planning to add a fix to prevent it in the next VEP version (96), which should be released next week.

In the unfortunate case where a field is duplicated, the last occurence of the field will be used for filtering: the code is looping over the ordered list of fields and adds the corresponding data into a hash, e.g.: $data{$field_name} = $field_value.

However this is something we need to look at, especially for VCF files with already existing INFO fields and/or using several VEP plugins.

Best regards, Laurent

matmu commented 5 years ago

Ok, thanks.