Closed JIBINJV closed 5 years ago
Hi Jibin, can you please confirm if you are working on GRCh37? If yes, could you please share the command you run? Thanks, Anja
Hi Anja,
Thank you for the reply,
Yes I am working with GRCh37
/home/jj/Software/ensembl-vep/./vep -i MNS01-chr-1_3.recode.vcf --cache -sift b --polyphen b --symbol --numbers --biotype --total_length -o MNS01-chr-1_3.vep --vcf --fields Consequence,Codons,Amino_acids,Gene,SYMBOL,Feature,EXON,PolyPhen,SIFT,Protein_position,BIOTYPE --port 3337 --plugin dbNSFP,/home/jibin/Software/dbNSFP_hg19.gz,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Uniprot_acc_Polyphen2,Uniprot_id_Polyphen2,Uniprot_aapos_Polyphen2,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,LRT_score,LRT_converted_rankscore,LRT_pred,LRT_Omega,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationTaster_model,MutationTaster_AAE,MutationAssessor_UniprotID,MutationAssessor_variant,MutationAssessor_score,MutationAssessor_score_rankscore,MutationAssessor_pred,FATHMM_score,FATHMM_converted_rankscore,FATHMM_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,Transcript_id_VEST3,Transcript_var_VEST3,VEST3_score,VEST3_rankscore,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,Reliability_index,M-CAP_score,M-CAP_rankscore,M-CAP_pred,REVEL_score,REVEL_rankscore,MutPred_score,MutPred_rankscore,MutPred_protID,MutPred_AAchange,MutPred_Top5features,CADD_raw,CADD_raw_rankscore,CADD_phred,DANN_score,DANN_rankscore,fathmm-MKL_coding_score --force_overwrite
Is it possible that some of your input variants have a .
as their start position? Can you also test your dbNSFP file by looking at a region with: tabix dbNSFP_hg19.gz 2:45686-45686?
I checked the vcf file but it doesn't have "." in the start position.
However I was able to run dbNSFP3.5a and produce the annotated file when I didn't use "GM12878_fitCons_score_rankscore, H1-hESC_fitCons_score_rankscore, HUVEC_fitCons_score_rankscore, integrated_fitCons_score_rankscore and MutationAssessor_score_rankscore" options with dbnsfp
Thank you for letting me know. We do have a fix for the first warning message: https://github.com/Ensembl/VEP_plugins/pull/88 I merged it also to release/93 branch. If you update your checkout the message should go away.
I tried to replicate the second type of warning message (Argument "." isn't numeric in numeric..) I was running your command including all the dbNSFP headers you are using on a VCF file which contains variants on all the chromosomes. I only got the the first warning.
Can you please provide an example input line for which you got the second type of warning message?
I'm closing the issue. But please reopen it if you have an further related questions or open a new issue thread.
@at7 I'm having an issue with dbNSFP plugin as well. I decided not to open a new issue, but let me know if it is more appropriate:
I apply VEP using my local cache (ensembl 94), using the GRCh37 version. I downloaded the dbNSFPv3.5a file and proceeded with the instructions in the perl plugin to get a final file to use with hg19 coordinates. It runs smoothly, but constantly I find a lot of unpredictable cases, even when I'm dealing with well known misense mutations. Since I also have CADD scores stored locally, I decided to compare the CADD scores obtained directly using CADD files vs CADD scores that resulted from dbNSFP . As you can see below, for a small set of well known pathogenic mutations, there is an huge discrepancy between the outputs, where the first column represents the scores from CADD files, whereas the second stands for dbNFSP. Why such differences? I'm afraid the same rationale applies for all the other scores that dbNSFP stores, so I may be overlooking a great number of variants just because there is some problem with dbNSFP.
This is the command I used:
vep -i $variants -o $output_vcf --compress_output bgzip --stats_text --hgvs --hgvsg --pubmed --cac he --dir /media/cache --offline --vcf --numbers --regulatory --sift b --polyphen b --force_overwrite -a GRCh37 --variant_class --vcf_info_field ANN --plugin CADD,/media/custom_data/cadd/whole_genome_SNVs.tsv.gz,/media/custom_data/cadd/InDels.tsv.gz --plugin dbNSFP,/media/custom_data/dbNSFP/hg19/dbNSFP_hg19.gz,GERP++_RS,phyloP20way_mammalian,SiPhy_29way_logOdds,phastCons20way_mammalian,MutationAssessor_score,SIFT_score,Polyphen2_HDIV_ score,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_pred,MutationTaster_score,MutationTaster_pred,MutationTaster_model,MutationTaster_AAE,FATHMM_score,FATHMM_pred,fathmm-MKL_coding_score,fathmm-MKL_coding_pred,PROVEAN_score,PROVEAN_pred,CADD_phred,DANN_score,Eigen-phred,Eigen-PC-phred,MetaSVM_score,MetaSVM_pred,MetaLR_score,MetaLR_pred,REVEL_score,GM12878_fitCons_score,H1-hESC_fitCons_score,HUVEC_fitCons_score_rankscore,LRT_score,LRT_pred,MutPred_score,VEST3_score,M-CAP_score
And these are the CADD scores:
CADD_PHRED CADD_phred
28.2 28.2
14.76 None
13.15 None
25.9 25.9
35 35
14.70 None
38 38
None None
20.7 None
26.4 26.4
None None
25.7 None
28.3 None
None None
25.1 None
25.4 None
23.2 None
24.8 None
25.9 25.9
23.2 None
27.5 None
27.5 None
24.0 None
27.6 None
22.8 None
None None
35 35
29.9 29.9
25.8 None
22.7 None
23.9 None
23.9 None
28.6 None
27.3 None
23.9 None
25.5 None
None None
26.5 None
38 38
None None
10.23 10.23
25.2 None
23.2 None
27.8 27.8
37 37
12.84 None
12.59 None
33 33
43 43
46 46
31 31
15.91 15.91
35 35
34 34
26.7 26.7
33 33
15.36 None
25.7 None
25.7 25.7
28.3 28.3
14.66 None
41 41
24.9 24.9
32 32
27.1 27.1
35 None
35 None
23.4 None
35 None
27.3 None
29.9 None
34 None
None None
12.50 None
34 None
Do you have any explanation for this behavior? Thanks, Pedro Barbosa
Hi Pedro, I will take a look and get back to you soon. Anja
I annotated a few variants with the CADD and dbNSFP plugins and always got results from both files. Can you please share some of your input variants for which you only get results from the CADD plugin? Thank you.
Please find attached the small set of variants I'm testing. variants.vep.input.txt
Yesterday, I updated my VEP installation (to include Refseq annotations), which also included the update of all VEP plugins. Suprinsingly, now I can't reproduce the same list as yesterday, as only 3 variants are annotated using the local CADD (even less then using dbNSFP)
CADD_PHRED CADD_phred
None 28.2
None None
None None
None 25.9
None 35
None None
None 38
None None
None None
None 26.4
None None
None None
None None
None None
None None
None None
None None
None None
None 25.9
None None
None None
None None
None None
None None
None None
None None
None 35
None 29.9
None None
None None
None None
None None
None None
None None
None None
None None
None None
None None
None 38
None 10.23
None None
None None
None None
None 27.8
None 37
None None
None None
None 33
None 43
None 46
None 31
None 15.91
None 35
None 34
None 26.7
None 33
None None
None None
None 25.7
None 28.3
None None
None 41
24.9 24.9
32 32
27.1 27.1
None None
None None
None None
None None
None None
None None
None None
None None
None None
None None
Can you please check that the strands of your input variants are set correctly? A lot of the variants I tested are already on the forward strand but in the input file the strand is given as reverse. Let me know if you are still missing annotations after you have corrected the strands.
I see, the alleles were representative of the positive strand, while the strand field displayed the strand of the gene itself, thanks.
Sill, I corrected the strands, and the differences remain (CADD vs dbNSFP)
Can you please send me the updated input file? It is also worth mentioning that the dbNSFP plugin only returns values for variants of the following consequence types: missense_variant stop_lost stop_gained start_lost.
It is the problem indeed. The ones missing in dbNSFP are not missense_variant stop_lost stop_gained start_lost.
Out of curiosity, why does the VEP plugin discards the other types ? As far as I'm aware, dbNSFP file reports splicing related consequences.
That is a good point. We initially chose to filter by consequence type for speed purpose. But we need to revise this and allow for more consequence types to be included. I will run some tests before I submit a change to the plugin code. If you want to already remove the filter just remove this line from the plugin. I'm aiming to get this improved for the next ensembl release. I will also post any updates here. Thank you.
Well by removing that line I mostly get None values.
I'll be waiting for the update. Thank you
@at7 I'm hav
@at7 I'm having an issue with dbNSFP plugin as well. I decided not to open a new issue, but let me know if it is more appropriate:
I apply VEP using my local cache (ensembl 94), using the GRCh37 version. I downloaded the dbNSFPv3.5a file and proceeded with the instructions in the perl plugin to get a final file to use with hg19 coordinates. It runs smoothly, but constantly I find a lot of unpredictable cases, even when I'm dealing with well known misense mutations. Since I also have CADD scores stored locally, I decided to compare the CADD scores obtained directly using CADD files vs CADD scores that resulted from dbNSFP . As you can see below, for a small set of well known pathogenic mutations, there is an huge discrepancy between the outputs, where the first column represents the scores from CADD files, whereas the second stands for dbNFSP. Why such differences? I'm afraid the same rationale applies for all the other scores that dbNSFP stores, so I may be overlooking a great number of variants just because there is some problem with dbNSFP.
This is the command I used:
vep -i $variants -o $output_vcf --compress_output bgzip --stats_text --hgvs --hgvsg --pubmed --cac he --dir /media/cache --offline --vcf --numbers --regulatory --sift b --polyphen b --force_overwrite -a GRCh37 --variant_class --vcf_info_field ANN --plugin CADD,/media/custom_data/cadd/whole_genome_SNVs.tsv.gz,/media/custom_data/cadd/InDels.tsv.gz --plugin dbNSFP,/media/custom_data/dbNSFP/hg19/dbNSFP_hg19.gz,GERP++_RS,phyloP20way_mammalian,SiPhy_29way_logOdds,phastCons20way_mammalian,MutationAssessor_score,SIFT_score,Polyphen2_HDIV_ score,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_pred,MutationTaster_score,MutationTaster_pred,MutationTaster_model,MutationTaster_AAE,FATHMM_score,FATHMM_pred,fathmm-MKL_coding_score,fathmm-MKL_coding_pred,PROVEAN_score,PROVEAN_pred,CADD_phred,DANN_score,Eigen-phred,Eigen-PC-phred,MetaSVM_score,MetaSVM_pred,MetaLR_score,MetaLR_pred,REVEL_score,GM12878_fitCons_score,H1-hESC_fitCons_score,HUVEC_fitCons_score_rankscore,LRT_score,LRT_pred,MutPred_score,VEST3_score,M-CAP_score
And these are the CADD scores:
CADD_PHRED CADD_phred 28.2 28.2 14.76 None 13.15 None 25.9 25.9 35 35 14.70 None 38 38 None None 20.7 None 26.4 26.4 None None 25.7 None 28.3 None None None 25.1 None 25.4 None 23.2 None 24.8 None 25.9 25.9 23.2 None 27.5 None 27.5 None 24.0 None 27.6 None 22.8 None None None 35 35 29.9 29.9 25.8 None 22.7 None 23.9 None 23.9 None 28.6 None 27.3 None 23.9 None 25.5 None None None 26.5 None 38 38 None None 10.23 10.23 25.2 None 23.2 None 27.8 27.8 37 37 12.84 None 12.59 None 33 33 43 43 46 46 31 31 15.91 15.91 35 35 34 34 26.7 26.7 33 33 15.36 None 25.7 None 25.7 25.7 28.3 28.3 14.66 None 41 41 24.9 24.9 32 32 27.1 27.1 35 None 35 None 23.4 None 35 None 27.3 None 29.9 None 34 None None None 12.50 None 34 None
Do you have any explanation for this behavior? Thanks, Pedro Barbosa
Hi @PedroBarbosa I would like to ask you an help in order to sort the dbNSFP3.5 database with Hg19 coordinates. I run the following comand: cat dbNSFP3.5a_variant.chr* \ | /path/to//scripts/dbNSFP_sort.pl 7 8 \
dbNSFP3.5a.hg19.txt bgzip dbNSFP3.5a.hg19.txt tabix -s 1 -b 2 -e 2 dbNSFP3.5a.hg19.txt.gz
but if I open one of the dbNSFP_variant.chr* file, I see that the Hg19 coordinates are in the columns 8 and 9 (#chr pos(1-based) ref alt aaref aaalt rs_dbSNP150 hg19_chr hg19_pos(1-based) hg18_chr hg18_pos(1-based) genename cds_strand refcodon codonpos codon_degeneracy ) Do I need to change the comand launched?
thank you
Hi @at7 or @PedroBarbosa can you help me with the dbNSFP3.5? I'm using the database with the Hg19 coordinates and I'm trying to annotate a vcf but there is something wrong: if I annotate with the 3.5 version I obtained only a few annotations, indeed with the 2.9 version I annotated all the variants. Here an example: grep SIFT_pred myfile_2.9.vcf | wc -l 9416 grep SIFT_pred myfile_3.5.vcf | wc -l 376
Why?
thank you in advance
@MartaRus, could you please send me a few examples for which you get a result for 2.9 but not for 3.5? Thank you
@at7 thank you for your reply! Meanwhile I understood where is the problem but I am not able to find a solution! Maybe you can help me! The script is:
cat dbNSFP3.5a_variant.chr* \ | /mnt/storage/FreeNGS/Prog/Resources/Bin/snpEff_latest/scripts/dbNSFP_sort.pl 7 8 \
dbNSFP3.5a.hg19_2.txt bgzip dbNSFP3.5a.hg19_2.txt tabix -s 1 -b 2 -e 2 dbNSFP3.5a.hg19.txt_2.gz
but I cannot understand why it stops in the middle of the fisrt step (cat dbNSFP3.5a_variant.chr). The process well works for a few chromosomes, then it stops with the writings "killed". I cannot understand why. It is to note that if I tryed to launch the only first step (cat dbNSFP3.5a_variant.chr > total_var) it works well and all the chromosomes are processed, but if I sequentially launch the second part of the script (/path/dbNSFP_sort.pl 7 8 dbNSFP3.5_var_tot > dbNSFP3.5_var_tot_sortHg19 it douesn't work (the error: Usage cat dbNSFP*.txt | dbNSFP_sort.pl chrCol posCol in attachment the sort.pl script
thank you dbNSFP_sort.txt
Hi @MartaRus, the script your are using isn't written by Ensembl. Please contact the author for any questions. Are you trying to tabix dbNSFP files for GRCh37? We have written some documentation for the required steps. Let me know if you have any questions.
Hi, I downloaded dbNSFPv3.5a and processed it as mentioned in "https://github.com/Ensembl/VEP_plugins/blob/release/93/dbNSFP.pm".
However I am getting following error, please help me to solve it
Jibin
Version: ensembl-vep:release_93
Error message
Use of uninitialized value in concatenation (.) or string at /home/jj/.vep/Plugins/dbNSFP.pm line 230. Use of uninitialized value in concatenation (.) or string at /home/jj/.vep/Plugins/dbNSFP.pm line 230. Argument "." isn't numeric in numeric eq (==) at /home/jj/.vep/Plugins/dbNSFP.pm line 283. Argument "." isn't numeric in numeric eq (==) at /home/jj/.vep/Plugins/dbNSFP.pm line 283.
Command Used for processing