Closed martin-g closed 1 year ago
Is GenomicRanges installed? This tries to connect to the Bioconductor Azure Data Lake. There might have been some connectivity issues?
I tried this morning on my local and the resources is available:
> ah = AnnotationHub()
snapshotDate(): 2023-03-21
>
> temp = ah[["AH28812"]]
loading from cache
require("GenomicRanges")
> temp
GRanges object with 2672001 ranges and 21 metadata columns:
seqnames ranges strand | source type score
<Rle> <IRanges> <Rle> | <factor> <factor> <numeric>
[1] 1 11869-14409 + | havana gene NA
[2] 1 11869-14409 + | havana transcript NA
[3] 1 11869-12227 + | havana exon NA
[4] 1 12613-12721 + | havana exon NA
[5] 1 13221-14409 + | havana exon NA
... ... ... ... . ... ... ...
[2671997] MT 15888-15953 + | ensembl transcript NA
[2671998] MT 15888-15953 + | ensembl exon NA
[2671999] MT 15956-16023 - | insdc gene NA
[2672000] MT 15956-16023 - | ensembl transcript NA
[2672001] MT 15956-16023 - | ensembl exon NA
phase gene_id gene_version gene_name gene_source
<integer> <character> <numeric> <character> <character>
[1] <NA> ENSG00000223972 5 DDX11L1 havana
[2] <NA> ENSG00000223972 5 DDX11L1 havana
[3] <NA> ENSG00000223972 5 DDX11L1 havana
[4] <NA> ENSG00000223972 5 DDX11L1 havana
[5] <NA> ENSG00000223972 5 DDX11L1 havana
... ... ... ... ... ...
[2671997] <NA> ENSG00000210195 2 MT-TT insdc
[2671998] <NA> ENSG00000210195 2 MT-TT insdc
[2671999] <NA> ENSG00000210196 2 MT-TP insdc
[2672000] <NA> ENSG00000210196 2 MT-TP insdc
[2672001] <NA> ENSG00000210196 2 MT-TP insdc
gene_biotype transcript_id transcript_version
<character> <character> <numeric>
[1] transcribed_unproces.. <NA> NA
[2] transcribed_unproces.. ENST00000456328 2
[3] transcribed_unproces.. ENST00000456328 2
[4] transcribed_unproces.. ENST00000456328 2
[5] transcribed_unproces.. ENST00000456328 2
... ... ... ...
[2671997] Mt_tRNA ENST00000387460 2
[2671998] Mt_tRNA ENST00000387460 2
[2671999] Mt_tRNA <NA> NA
[2672000] Mt_tRNA ENST00000387461 2
[2672001] Mt_tRNA ENST00000387461 2
transcript_name transcript_source transcript_biotype exon_number
<character> <character> <character> <numeric>
[1] <NA> <NA> <NA> NA
[2] DDX11L1-002 havana processed_transcript NA
[3] DDX11L1-002 havana processed_transcript 1
[4] DDX11L1-002 havana processed_transcript 2
[5] DDX11L1-002 havana processed_transcript 3
... ... ... ... ...
[2671997] MT-TT-201 ensembl Mt_tRNA NA
[2671998] MT-TT-201 ensembl Mt_tRNA 1
[2671999] <NA> <NA> <NA> NA
[2672000] MT-TP-201 ensembl Mt_tRNA NA
[2672001] MT-TP-201 ensembl Mt_tRNA 1
exon_id exon_version tag ccds_id protein_id
<character> <numeric> <character> <character> <character>
[1] <NA> NA <NA> <NA> <NA>
[2] <NA> NA <NA> <NA> <NA>
[3] ENSE00002234944 1 <NA> <NA> <NA>
[4] ENSE00003582793 1 <NA> <NA> <NA>
[5] ENSE00002312635 1 <NA> <NA> <NA>
... ... ... ... ... ...
[2671997] <NA> NA <NA> <NA> <NA>
[2671998] ENSE00001544475 2 <NA> <NA> <NA>
[2671999] <NA> NA <NA> <NA> <NA>
[2672000] <NA> NA <NA> <NA> <NA>
[2672001] ENSE00001544473 2 <NA> <NA> <NA>
protein_version
<numeric>
[1] NA
[2] NA
[3] NA
[4] NA
[5] NA
... ...
[2671997] NA
[2671998] NA
[2671999] NA
[2672000] NA
[2672001] NA
-------
seqinfo: 270 sequences (1 circular) from GRCh38 genome
GenomicRanges_1.51.4 is installed!
What is the url of the Azure Data lake that fails ?
wget ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz
works just fine on the command line. But I am not sure whether this is the failing url.
It is not. that is the source url for the initial data and not necessarily the final product. In more recent versions we provide gtf's by accessing and converting the ensembl directly but this seems to predate that.
The API will hit
"https://annotationhub.bioconductor.org/fetch/34252"
and the actual file retrieved would be
"https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData"
Both urls resolve to the same file and work fine on the test machine:
wget https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData
--2023-03-27 12:01:29-- https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData
Resolving bioconductorhubs.blob.core.windows.net (bioconductorhubs.blob.core.windows.net)... 52.239.247.164, 20.150.32.196, 52.239.247.68
Connecting to bioconductorhubs.blob.core.windows.net (bioconductorhubs.blob.core.windows.net)|52.239.247.164|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20121337 (19M) [binary/octet-stream]
Saving to: ‘Homo_sapiens.GRCh38.77.gtf.RData’
Homo_sapiens.GRCh38.77.gtf.RData 100%[======================================================================================================================>] 19.19M 1.28MB/s in 16s
2023-03-27 12:01:46 (1.19 MB/s) - ‘Homo_sapiens.GRCh38.77.gtf.RData’ saved [20121337/20121337]
wget https://annotationhub.bioconductor.org/fetch/34252
--2023-03-27 12:02:44-- https://annotationhub.bioconductor.org/fetch/34252
Resolving annotationhub.bioconductor.org (annotationhub.bioconductor.org)... 52.73.93.102
Connecting to annotationhub.bioconductor.org (annotationhub.bioconductor.org)|52.73.93.102|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData [following]
--2023-03-27 12:02:45-- https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData
Resolving bioconductorhubs.blob.core.windows.net (bioconductorhubs.blob.core.windows.net)... 52.239.247.164, 52.239.247.68, 20.150.32.196
Connecting to bioconductorhubs.blob.core.windows.net (bioconductorhubs.blob.core.windows.net)|52.239.247.164|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20121337 (19M) [binary/octet-stream]
Saving to: ‘34252’
34252 100%[======================================================================================================================>] 19.19M 1.24MB/s in 16s
2023-03-27 12:03:02 (1.19 MB/s) - ‘34252’ saved [20121337/20121337]
I'll try to investigate why the download may fail in R code!
Thank you for the hints, @lshep !
This is unlikely to be a Linux ARM64 specific error. Sounds more like a connectivity issue with your Linux ARM64 builder @martin-g
It fails consistently when executed via R CMD check
, while the wget https://...
pass without problems or timeouts.
out of curiosity if you use httr::GET
to download does that succeed?
Are you behind any sort of proxy that would need to be set up for download?
Do you mean lwp-request
's GET
?
GET https://annotationhub.bioconductor.org/fetch/34252
downloaded it and printed the binary directly in the terminal.
There is no HTTP(S) proxy !
library(httr)
> r <- GET("https://annotationhub.bioconductor.org/fetch/34252")
> r
Response [https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData]
Date: 2023-03-27 19:24
Status: 200
Content-Type: binary/octet-stream
Size: 20.1 MB
<BINARY BODY>
It looks like the local cache was corrupted!
I removed /home/biocbuild/.cache/R/AnnotationHub
and now the check passed!
Thank you for your help, @lshep & @hpages !
Hello,
R CMD check
fails on Linux ARM64 with the following output:Any idea what could be the problem ?