Closed yesitsjess closed 3 years ago
Someone might have a better thought but:
What versions of R and Bioconductor are you using? sessionInfo()
Currently when I run I can get information back:
> library(GenomeInfoDb)
> Seqinfo(genome="hg19")
Seqinfo object with 93 sequences (1 circular) from hg19 genome:
seqnames seqlengths isCircular genome
chr1 249250621 FALSE hg19
chr2 243199373 FALSE hg19
chr3 198022430 FALSE hg19
chr4 191154276 FALSE hg19
chr5 180915260 FALSE hg19
... ... ... ...
chrUn_gl000245 36651 FALSE hg19
chrUn_gl000246 38154 FALSE hg19
chrUn_gl000247 36422 FALSE hg19
chrUn_gl000248 39786 FALSE hg19
chrUn_gl000249 38502 FALSE hg19
so the thought of a proxy setting could be the case. Do you know if you have issues connecting to other websites or datasets besides this one? Are you running from an institution that might have firewall and proxy set up?
There is some information in download.file about proxy that might be useful as well as some of these pages I found about setting proxy globally for R Rstudio proxy , Sys.env for proxy, and Proxy settings for R
> sessionInfo()
R version 3.5.0 (2018-04-23)
GenomeInfoDb_1.18.2
BiocInstaller_1.32.1
I'm definitely behind a firewall but I've set up my proxy settings using
Sys.setenv(http_proxy="proxy")
Sys.setenv(https_proxy="proxy")
Using the curl package works fine, e.g.
readLines(curl(url="http://www.google.co.uk"))
It's a bit of a pain, but I'd be happy to download the correct information from UCSC directly and change the function in CERES so it doesn't try to connect anymore, I'm just not sure which is the correct file for hg19 from their downloads page.
EDIT: done a bit of digging, think the file is this one so will have a go at pointing at that now
It would be helpful to debug this. I believe the essential code is
url = "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz"
download.file(url, tempfile(), quiet = TRUE)
I would focus on getting that to work. I believe the http_proxy
setting should not be "proxy", but rather the IP address of the proxy server, from the help page:
The form of 'http_proxy' should be 'http://proxy.dom.com/' or 'http://proxy.dom.com:8080/' where the port defaults to '80' and the trailing slash may be omitted.
I would also experiment with setting options(download.file.method = "libcurl")
or "wininet".
No problem with that:
> url = "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz"
> download.file(url, tempfile())
trying URL 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz'
Content type 'application/x-gzip' length 837 bytes
==================================================
downloaded 837 bytes
Also I know it's not actually proxy but I didn't think it was a good idea to post my proxy details externally ;) I'm being paranoid perhaps & sorry for being unclear.
I think something very strange is going on... if I repeatedly spam the same command it does occasionally work.
> Sys.setenv(http_proxy="http://proxy:port")
> Seqinfo(genome="hg19")
Error in function (type, msg, asError = TRUE) :
Failed connect to ftp.ncbi.nlm.nih.gov:21; Connection refused
> Sys.setenv(http_proxy="http://proxy:port")
> Seqinfo(genome="hg19")
Error in function (type, msg, asError = TRUE) :
Failed connect to ftp.ncbi.nlm.nih.gov:21; Connection refused
> Sys.setenv(http_proxy="http://proxy:port")
> Seqinfo(genome="hg19")
Seqinfo object with 93 sequences (1 circular) from hg19 genome:
seqnames seqlengths isCircular genome
chr1 249250621 FALSE hg19
chr2 243199373 FALSE hg19
chr3 198022430 FALSE hg19
chr4 191154276 FALSE hg19
chr5 180915260 FALSE hg19
... ... ... ...
chrUn_gl000245 36651 FALSE hg19
chrUn_gl000246 38154 FALSE hg19
chrUn_gl000247 36422 FALSE hg19
chrUn_gl000248 39786 FALSE hg19
chrUn_gl000249 38502 FALSE hg19
So I can tell it's definitely something on my end, not yours. Thanks for your help :)
If there's a workaround anyone's aware of using something closer to download.file(url, tempfile())
please let me know because that never fails and the fact the other fails 90% of the time is very frustrating
after it fails, what does the command traceback()
say?
> traceback()
14: fun(structure(list(message = msg, call = sys.call()), class = c(typeName,
"GenericCurlError", "error", "condition")))
13: function (type, msg, asError = TRUE)
{
if (!is.character(type)) {
i = match(type, CURLcodeValues)
typeName = if (is.na(i))
character()
else names(CURLcodeValues)[i]
}
typeName = gsub("^CURLE_", "", typeName)
fun = (if (asError)
stop
else warning)
fun(structure(list(message = msg, call = sys.call()), class = c(typeName,
"GenericCurlError", "error", "condition")))
}(7L, "Failed connect to ftp.ncbi.nlm.nih.gov:21; Connection refused",
TRUE)
12: curlPerform(curl = curl, .opts = opts, .encoding = .encoding)
11: getURL(url)
10: list_ftp_dir(url)
9: .make_assembly_report_URL(assembly_accession)
8: fetch_assembly_report(assembly_accession, AssemblyUnits = AssemblyUnits)
7: FUN(genome = names(SUPPORTED_UCSC_GENOMES)[idx], circ_seqs = supported_genome$circ_seqs,
assembly_accession = supported_genome$assembly_accession,
AssemblyUnits = supported_genome$AssemblyUnits, special_mappings = supported_genome$special_mappings,
unmapped_seqs = supported_genome$unmapped_seqs, drop_unmapped = supported_genome$drop_unmapped,
goldenPath_url = goldenPath_url, quiet = quiet)
6: fetchExtendedChromInfoFromUCSC(genome, goldenPath_url = goldenPath_url,
quiet = TRUE)
5: .fetch_sequence_info_for_UCSC_genome(genome)
4: fetchSequenceInfo(genome)
3: .class1(object)
2: as(fetchSequenceInfo(genome), "Seqinfo")
1: Seqinfo(genome = "hg19")
So then the problematic call looks like it is
url = "ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/"
res = RCurl::getURL(url)
This is an old issue and the OP didn't follow up so I'm closing it.
Hi, I'm trying to use the CERES package which depends on GenomeInfoDb. It fails due to the following error:
Seqinfo(genome="hg19")
Error in function (type, msg, asError = TRUE) : Failed connect to ftp.ncbi.nlm.nih.gov:21; Connection refusedI've checked fetchExtendedChromInfoFromUCSC and it appears to be supported - is this a problem with my proxy settings? If so could I download and point the function to a file instead?