a-xavier / tapes

TAPES : a Tool for Assessment and Prioritisation in Exome Studies
24 stars 11 forks source link

Problems with vep annotation #5

Open vedellpt opened 4 years ago

vedellpt commented 4 years ago

I am trying to run tapes using a vep vcf as input. The vep vcf that I am using as input contains the ClinVar significance information in the CLIN_SIG entry which is part of the CSQ INFO field which has description "Consequence annotations from Ensembl VEP".

I have process vep vcfs for 37 patients. I know some have some pathogenic variants but none are classified as such by tapes. I think it probably has to do with this message in the log:

"2020-03-20 09:51:55.....PS1 done || No trio data, skipping PS2 || No "clinvar_golden_stars" or "clinvar_clnsig" column found. Please annotate your data with a recent clinvar database || All frequency data not found || No domain data found 2020-03-20 09:51:55.....PM5 done"

I think it could also be related to this message:

"2020-03-20 09:51:55.....Starting... || Cannnot calculate PVS1, no splicing annotation. Please annotate with dbscSNV"

Can you please provide some suggestions on how I can get past these problems? Thanks.

vedellpt commented 4 years ago

I have been able to get db, annotate, and sort to work on the toy dataset that you have as an example. I have tried a number of things and have continued to be unsuccessful in getting it a successful run of annotate on my annovar input vcf and I also continued to be unsuccessful in getting a successful run of sort on my vep input vcf. Is there anything different about the way the toy example dataset processing is done compared to that of other datasets?

vedellpt commented 4 years ago

For my latest effort, I take a vep vcf from the vep web portal and try to use it as input to tapes sort. I get the same error for an ascii, Unicode and Unix formatted input file. It is as shown below. Can you tell me how to get past this? Is there a way to tell it which format it is? It seems that it is unable to figure it out in this function call.

[m4@mf4 tapes 2020-03-23 02:18:27] $ cat /research/tapes_sort_veptest8.err Traceback (most recent call last): File "tapes.py", line 355, in main() File "tapes.py", line 205, in main full_stuff, soft_used = tf.open_csv_file(file_path, acmg_db_path) # Load annotated csv in pandas File "/research/TAPES/tapes/src/t_func.py", line 63, in open_csv_file soft_used = check_which_soft_used(csv_file) File "/research/TAPES/tapes/src/t_func.py", line 3723, in check_which_soft_used for line in input_vcf.readlines(): File "/research/python/3.6.7/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

vedellpt commented 4 years ago

Anyway, congratulation on your Plos Computational Biology publication and on developing this tool. I think it is a nice publication and I think it serves an important purpose.

EugeneEA commented 4 years ago

Anyway, congratulation on your Plos Computational Biology publication and on developing this tool. I think it is a nice publication and I think it serves an important purpose.

I also have some problems with VEP vcf annotation, but as far as it seems that tool does not have a support I'm not sure that it makes sense to open an issue ( (I'd like to be wrong, because there are few options for ACMG assignments :( )

NTNguyen13 commented 4 years ago

Hi, I found the way to solve this problem:

You will need 2 plugins: dbscSNV and dbNSFP for all annotations, add it to VEP command like this:

    --plugin dbNSFP,/path/to/dbNSFP_hg19.gz,gnomAD_genomes_AF,gnomAD_exomes_AF,CADD_phred,FATHMM_converted_rankscore,clinvar_clnsig,clinvar_golden_stars,Interpro_domain,SIFT_score,LRT_pred,MutationTaster_pred,MutationAssessor_pred,FATHMM_pred,PROVEAN_score,MetaSVM_pred,MetaLR_pred,M-CAP_pred,fathmm-MKL_coding_pred,GenoCanyon_score,GERP++_RS \
    --plugin dbscSNV,/path/to/dbscSNV1.1_GRCh37.txt.gz
EugeneEA commented 4 years ago

Hi, I found the way to solve this problem:

You will need 2 plugins: dbscSNV and dbNSFP for all annotations, add it to VEP command like this:

    --plugin dbNSFP,/path/to/dbNSFP_hg19.gz,gnomAD_genomes_AF,gnomAD_exomes_AF,CADD_phred,FATHMM_converted_rankscore,clinvar_clnsig,clinvar_golden_stars,Interpro_domain,SIFT_score,LRT_pred,MutationTaster_pred,MutationAssessor_pred,FATHMM_pred,PROVEAN_score,MetaSVM_pred,MetaLR_pred,M-CAP_pred,fathmm-MKL_coding_pred,GenoCanyon_score,GERP++_RS \
    --plugin dbscSNV,/path/to/dbscSNV1.1_GRCh37.txt.gz

Hi, thanks for posting it. Eventually I've exchanged VEP for OpenCravat and re-write the code of InterVar to make both compatible.

land-mine commented 4 years ago

Hello @EugeneEA, I'm working on the variant interpretation. If you can provide the modified code of InterVar it will be helpful.

SouzaBB commented 3 years ago

Hi, I found the way to solve this problem:

You will need 2 plugins: dbscSNV and dbNSFP for all annotations, add it to VEP command like this:

    --plugin dbNSFP,/path/to/dbNSFP_hg19.gz,gnomAD_genomes_AF,gnomAD_exomes_AF,CADD_phred,FATHMM_converted_rankscore,clinvar_clnsig,clinvar_golden_stars,Interpro_domain,SIFT_score,LRT_pred,MutationTaster_pred,MutationAssessor_pred,FATHMM_pred,PROVEAN_score,MetaSVM_pred,MetaLR_pred,M-CAP_pred,fathmm-MKL_coding_pred,GenoCanyon_score,GERP++_RS \
    --plugin dbscSNV,/path/to/dbscSNV1.1_GRCh37.txt.gz

Hi... I'm running tapes using my VEP output just like you've posted here but, still having isuue with error: All required annotations not found.

Did you managed to solve the issue??