Bioconductor / ensemblVEP

[DEPRECATED] R Interface to Ensembl Variant Effect Predictor
https://bioconductor.org/packages/ensemblVEP
5 stars 4 forks source link

use different ensembl default host when old versions of VEP are in use #2

Closed jayoung closed 6 years ago

jayoung commented 6 years ago

Hi Lori,

Submitting a github issue for something we already talked about on the support site (here: https://support.bioconductor.org/p/106976/#107036).

I ran into trouble using v88 of the VEP script with ensemblVEP, because the default host used by the bioc module (useastdb.ensembl.org) does not support anything other than the current or current minus 1 versions of the vep script (according to https://www.ensembl.org/info/data/mysql.html), and the script gives a very uninformative error if you try to use an older version of the vep script with the US host, whether on command line or through R/Bioc.

Here's the example code - this fails for me with a mysterious error:


library(ensemblVEP)
file <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")

myparam88a <- VEPParam( version=88, dataformat=c(vcf=TRUE)) 
vcf88a <- ensemblVEP(file, param=myparam88a)
#Can't use an undefined value as an ARRAY reference at #/home/jayoung/malik_lab_shared/perl/ensembl/modules/Bio/EnsEMBL/Registry.pm line 2546.
#Error in .io_check_exists(path(con)) : file(s) do not exist:
#  '/tmp/Rtmpi8eJw2/file52c81bd6649'

when I specify a different host it works fine:

myparam88b <- VEPParam( version=88, dataformat=c(vcf=TRUE), database=c(host="ensembldb.ensembl.org")) 
vcf88b <- ensemblVEP(file, param=myparam88b)

Would it be possible to change the default host when older versions of VEP are specified? I didn't expect that to be an issue, because on my real data I was using a local cached database and didn't think I was contacting any host at all. The same fix works on my data, though, so I'm happy:


tempVEPcacheDir <- "/fh/fast/malik_h/grp/public_databases/Ensembl/VariantEffectPredictorCache"
tempVEPpluginsDir <- "/fh/fast/malik_h/grp/public_databases/Ensembl/VariantEffectPredictorCache/plugins"

## this fails: 
myparamsYeast88a <- VEPParam( version=88,
                            input=c(species="saccharomyces_cerevisiae"), 
                            cache=c(cache=TRUE, dir=tempVEPcacheDir,
                                dir_cache=tempVEPcacheDir, dir_plugins=tempVEPpluginsDir),
                            database=c(database=FALSE),
                            dataformat=c(vcf=TRUE) ) 
tempTest88a <- ensemblVEP(testFile, myparamsYeast88a) 
#Can't use an undefined value as an ARRAY reference at #/home/jayoung/malik_lab_shared/perl/ensembl/modules/Bio/EnsEMBL/Registry.pm line 2546.
#Error in .io_check_exists(path(con)) : file(s) do not exist:
#  '/tmp/RtmpxuFFsr/file560c5748392c'

## this works:
myparamsYeast88b <- VEPParam( version=88,
                            input=c(species="saccharomyces_cerevisiae"), 
                            cache=c(cache=TRUE, dir=tempVEPcacheDir,
                                dir_cache=tempVEPcacheDir, dir_plugins=tempVEPpluginsDir),
                            database=c(database=FALSE, host="ensembldb.ensembl.org"),
                            dataformat=c(vcf=TRUE) ) 
tempTest88b <- ensemblVEP(testFile, myparamsYeast88b) 

Session info is below (truncated). Thanks,

Janet



sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

other attached packages:
[1] ensemblVEP_1.20.1          VariantAnnotation_1.22.3
[3] Rsamtools_1.28.0           Biostrings_2.44.2         
[5] XVector_0.16.0             SummarizedExperiment_1.6.0
[7] DelayedArray_0.2.0         matrixStats_0.52.2        
[9] Biobase_2.36.0             GenomicRanges_1.28.4      
[11] GenomeInfoDb_1.12.0        IRanges_2.10.3            
[13] S4Vectors_0.14.0           BiocGenerics_0.22.0    
lshep commented 6 years ago

After discussion with the team we have not decided to change the host but we will try and make this distinction more clear in the documentation.

jayoung commented 6 years ago

Thanks again. Given that the default host is the mirror, and Ensembl tell us that it only works with current and immediate previous versions of VEP, is it possible to include some sort of simple addition (hidden to the user) that would go like this (pseudocode):

if (versionRequested < (currentVersion-1) ) {
    hostToUse <- 'ensembldb.ensembl.org'
    ## alternatively, simply warn the user that they need to specify a different host if they use an older version
}

That way there won't be a mysterious perl-derived error that's hard to track down. What do you think? thanks, Janet

lshep commented 6 years ago

I have reopened the issue and will look into this further

lshep commented 6 years ago

This is updated in the most recent devel version 1.21.4 - We now do not specify a default host so it will use the defaults from ensembl vep and advise in the documentation for US users to use the mirror for latency issues.

jayoung commented 6 years ago

thanks very much - that solution makes a lot of sense