Ensembl / ensembl-vep

The Ensembl Variant Effect Predictor predicts the functional effects of genomic variants
https://www.ensembl.org/vep
Apache License 2.0
449 stars 151 forks source link

INFO: disabling SIFT, POLYPHEN ... #514

Closed madzafv closed 5 years ago

madzafv commented 5 years ago

I'm running ~/install/ensembl-vep-release-96/vep -i cut_wbm_par5_fil_genes.vcf -o test.txt --species taeniopygia_guttata --cache --dir_cache ./ --force_overwrite --canonical --symbol --tab --everything

and getting this

2019-06-21 20:25:45 - INFO: disabling SIFT
2019-06-21 20:25:45 - INFO: disabling PolyPhen
2019-06-21 20:25:45 - INFO: Database will be accessed when using --hgvs
2019-06-21 20:25:45 - INFO: Database will be accessed when using --hgvsc
2019-06-21 20:25:45 - INFO: Database will be accessed when using --hgvsp

and the program freezes outputting

## ENSEMBL VARIANT EFFECT PREDICTOR v96.3
## Output produced at 2019-06-21 20:25:45
## Connected to taeniopygia_guttata_core_96_1 on ensembldb.ensembl.org
## Using cache in ./taeniopygia_guttata/96_taeGut3.2.4
## Using API version 96, DB version 96
## ensembl-funcgen version 96.d901739
## ensembl-variation version 96.617872b
## ensembl version 96.af6c2b8
## ensembl-io version 96.6e65b30
## genebuild version 2009-02
...

## MAX_AF_POPS : Populations in which maximum allele frequency was observed
## CLIN_SIG : ClinVar clinical significance of the dbSNP variant
## SOMATIC : Somatic status of existing variant
## PHENO : Indicates if existing variant(s) is associated with a phenotype, disease or trait; multiple values correspond to multiple variants
## PUBMED : Pubmed ID(s) of publications that cite existing variant
## MOTIF_NAME : The source and identifier of a transcription factor binding profile (TFBP) aligned at this position
## MOTIF_POS : The relative position of the variation in the aligned TFBP
## HIGH_INF_POS : A flag indicating if the variant falls in a high information position of the TFBP
## MOTIF_SCORE_CHANGE : The difference in motif score of the reference and variant sequences for the TFBP
#Uploaded_variation     Location        Allele  Gene    Feature Feature_type    Consequence     cDNA_position   CDS_position    Protein_position        Amino_acids   Codons  Existing_variation      IMPACT  DISTANCE        STRAND  FLAGS   VARIANT_CLASS   SYMBOL  SYMBOL_SOURCE   HGNC_ID BIOTYPE CANONICAL       TSL     APPRIS  CCDS  ENSP    SWISSPROT       TREMBL  UNIPARC GENE_PHENO      EXON    INTRON  DOMAINS miRNA   HGVSc   HGVSp   HGVS_OFFSET     AF      AFR_AF  AMR_AF  EAS_AF  EUR_AF  SAS_AFAA_AF   EA_AF   gnomAD_AF       gnomAD_AFR_AF   gnomAD_AMR_AF   gnomAD_ASJ_AF   gnomAD_EAS_AF   gnomAD_FIN_AF   gnomAD_NFE_AF   gnomAD_OTH_AF   gnomAD_SAS_AF   MAX_AFMAX_AF_POPS     CLIN_SIG        SOMATIC PHENO   PUBMED  MOTIF_NAME      MOTIF_POS       HIGH_INF_POS    MOTIF_SCORE_CHANGE

What could be going on?

dglemos commented 5 years ago

Hello @madzayasodara, Thank you for the detailed question. The messages that you get are just warnings/info: 1) you're using --everything which switches on --sift b and --polyphen b. SIFT and PolyPhen don't support your species that's why you get the message 'disabling ...' you can see here which species have SIFT results;
2) --everything also switches on --hgvs. In this case, there's two ways to get the hgvs: database or a fasta file. In your case you didn't input a fasta, and by default the database is going to be used ('Database will be accessed when using --hgvs'). About the output, it could be a problem with your input file. Could you please send your file?

madzafv commented 5 years ago

Got it. I don't know which one of the things I did fixed the problem... I can run vep locally now, but when I submit to the cluster's job scheduler, it complains about not finding perl modules. I try to install with cpan Try::Tiny but I get 'Do not have write permissions on '/usr/local/share/man/man3'. How can I install these modules locally and point the program to use them? I checked and I'm using the perl global installation. Should I open another issue on this? thxs!

dglemos commented 5 years ago

Try::Tiny is not a vep requirement anymore, here you can see the current requirements for vep installation. Could you run your vep command in the cluster and put here the error message you get?

madzafv commented 5 years ago

I installed perl5 and it's modules locally following instructions from this guy https://gist.github.com/ckandoth/1f01d8f3692bb8de7f2929f259a4035f
and it's working now.

When I do

~/install/ensembl-vep-release-96/vep -i cut_wbm_par5_fil_genes.vcf -o test.txt --species taeniopygia_guttata --cache --dir_cache ./ --force_overwrite --canonical --symbol --tab --everything

I get that info

2019-06-21 20:25:45 - INFO: disabling SIFT
2019-06-21 20:25:45 - INFO: disabling PolyPhen
2019-06-21 20:25:45 - INFO: Database will be accessed when using --hgvs
2019-06-21 20:25:45 - INFO: Database will be accessed when using --hgvsc
2019-06-21 20:25:45 - INFO: Database will be accessed when using --hgvsp

(which I get it now)

the program starts to runs and reads few thousands of my vcf file and then gets stuck.

This doesn't happen when I don't use --everything...

dglemos commented 5 years ago

How many variants does your file have? If it's a big file and you're using --everything vep is going to load a lot of data at the same time. By default --buffer_size is 5000 which means vep reads into memory 5000 variants simultaneously. Could you run with --buffer_size 500 and see if it helps? buffer size documentation

madzafv commented 5 years ago

Yes this worked. Thank you!