Open paul-shannon opened 3 years ago
I think this is a better problem report:
eQTL_Catalogue.fetch(unique_id="GTEx.brain_frontal_cortex", chrom="8",, bp_lower=27610984, bp_upper=27610987)
[1] "CONDA:: Could not identify tabix executable in echoR env. Defaulting to generic 'tabix' command"
[1] "tabix ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz 8:27610984-27610987"
[E::hts_open_format] Failed to open file "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz" : Operation timed out
Couldn't open "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz": Operation timed out
My tabix is Cellar/htslib/1.14/bin/tabix
more info. running on ubuntu, a different tabix, same problem:
tabix ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz 8:27610984-276109801
[E::hts_open_format] Failed to open file "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz" : Operation timed out
Couldn't open "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/GTEx/ge/GTEx_ge_brain_frontal_cortex.all.tsv.gz": Operation timed out
any thoughts? It's clear this problem is outside of catalogueR!
Hi @paul-shannon, glad you're finding this tool useful. Thanks for pointing out this issue. I'll look into this and try to figure out what's going on here.
Some potential sources:
Potentially related: https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/issues/15
@kauralasoo is there anything on eQTL Catalogue's end that might be causing unstable connections to the FTP server?
I just confirmed that the file paths haven't changed, so they do indeed seem to exist.
Hi @paul-shannon and @bschilder,
We just received a confirmation form the EBI helpdesk that the root cause for this was that Paul's IP address had been blocked by the EBI firewall. Paul's IP has been whitelisted now, but unfortunately there is no good solution prevent it from happening to other users, because tabix requests over FTP (incomplete downloads) look a lot like DDoS attacks to the firewall. The REST API is much more robust, because it is able to rate limit the number of requests by IP address on its own.
Best, Kaur
Thanks so much for the response @kauralasoo! This is all really helpful info. I'll make some adjustments to catalogueR and may make it so that the REST API is the default method.
eQTL_Catalogue.query
so that the default is use_tabix=FALSE
due to
instability of using tabix
with the EBI server. Hi Brian,
One possible caution: Kaur explained to me this about GTEx:
Unfortunately the uniformly processed GTEx summary statistics are currently not available via the API. We hope to fix this with the next release planned for January 2022. However, we do have the official GTEx V8 summary statistics in the API. The study ID for those is GTEx_V8. Thus, this command works:
We've found that the official imported GTEx v8 summary statistics have slightly better power than our re-processed ones, probably due to better handling of covariates.
So perhaps, in your code, in the construction of the REST url, you could substitute like this, at least until the next release?
study=GTEx_V8 for study=GTEX
As it is now, none of the valuable GTEx eQTLs are available when using the REST interface to catalogueR.
On Nov 30, 2021, at 7:48 AM, Brian M. Schilder @.***> wrote:
Thanks so much for the response @kauralasoo! This is all really helpful info. I'll make some adjustments to catalogueR and may make it so that the REST API is the default method.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
Thanks for the helpful info @paul-shannon, hadn't realized this!
catalogueR::eQTL_Catalogue.list_datasets
currently relies on the metadata provided here, tabix_ftp_paths.tsv:
https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/master/tabix/tabix_ftp_paths.tsv
It looks like there is another file called tabix_ftp_paths_imported.tsv: https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/master/tabix/tabix_ftp_paths_imported.tsv
I'll modify catalogueR::eQTL_Catalogue.list_datasets
to integrate this second file as well (with a tryCatch in case it doesn't exist in the future).
I've just updated the metadata to include GTEX_V8. I also added a new arg to eQTL_Catalogue.list_datasets
called include_imported
. Setting this to TRUE (default) will integrate the additional datasets in /tabix_ftp_paths_imported.tsv
Currently implemented in the dev branch.
I'm in the process of overhauling catalogueR
to make it compatible with (and take advantage of) the rest of the echoverse, which has expanded quite a bit and is much more robust now.
@kauralasoo has anything changed regarding using tabix
to query the eQTL Catalogue? If not, I'm going to add the following instructions whenever someone tries to use the fetch_tabix()
function:
WARNING: Querying eQTL Catalogue with tabix will only work
if your IP address has been whitelisted by an EMBL-EBI server administrator.
Please request access via this form:
https://www.ebi.ac.uk/about/contact/support/
Thank for this fine package - very useful in our work on Alzheimer's Disease.
I find intermittent - sometimes lasting - problems with the ftp service the package uses. Here is an example, establishing first that connectivity is good, then showing the error.
The specific file request times out: