Bioconductor / AnnotationHub

Client for the Bioconductor AnnotationHub web resource
17 stars 13 forks source link

"R CMD check" failure on Linux ARM64 #40

Closed martin-g closed 1 year ago

martin-g commented 1 year ago

Hello,

R CMD check fails on Linux ARM64 with the following output:

 R CMD check AnnotationHub_3.7.3.tar.gz 
* using log directory ‘/home/biocbuild/git/AnnotationHub.Rcheck’
* using R Under development (unstable) (2023-03-12 r83975)
* using platform: aarch64-unknown-linux-gnu (64-bit)
* R was compiled by
    gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
    GNU Fortran (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
* running under: Ubuntu 22.04.2 LTS
* using session charset: UTF-8
* checking for file ‘AnnotationHub/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘AnnotationHub’ version ‘3.7.3’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘AnnotationHub’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking startup messages can be suppressed ... OK
* checking dependencies in R code ... WARNING
'::' or ':::' imports not declared from:
  ‘CompoundDb’ ‘ensembldb’ ‘keras’
Unexported objects imported by ':::' calls:
  ‘BiocFileCache:::.get_tbl_rid’ ‘S4Vectors:::selectSome’
  See the note in ?`:::` about the use of this operator.
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... WARNING
checkRd: (5) AnnotationHub-class.Rd:131-139: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:148-151: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:152-155: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:156-159: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:160-163: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:164-168: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:169-175: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:176-210: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:212-214: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:216-218: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:220-222: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:231-235: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:236-240: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:241-247: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:248-269: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:270-278: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:279-284: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:291-295: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:296-300: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:301-312: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:313-316: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:317-320: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:321-328: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:329-336: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:337-342: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:349-353: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-class.Rd:354-358: \item in \describe must have non-empty label
checkRd: (5) AnnotationHub-deprecated.Rd:28-34: \item in \describe must have non-empty label
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... WARNING
Functions or methods with usage in documentation object 'AnnotationHub-deprecated' but not in code:
  ‘display’

* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... ERROR
Running examples in ‘AnnotationHub-Ex.R’ failed
The error most likely occurred in:

> ### Name: AnnotationHub-objects
> ### Title: AnnotationHub objects and their related methods and functions
> ### Aliases: class:AnnotationHub AnnotationHub-class class:Hub Hub-class
> ###   .Hub AnnotationHub refreshHub mcols,Hub-method cache cache,Hub-method
> ###   cache,AnnotationHub-method cache<- cache<-,Hub-method hubUrl
> ###   hubUrl,Hub-method hubCache hubCache,Hub-method hubDate
> ###   hubDate,Hub-method package package,Hub-method removeCache isLocalHub
> ###   isLocalHub,Hub-method isLocalHub<- isLocalHub<-,Hub-method
> ###   possibleDates snapshotDate snapshotDate,Hub-method snapshotDate<-
> ###   snapshotDate<-,Hub-method removeResources
> ###   removeResources,missing-method removeResources,character-method
> ###   dbconn,Hub-method dbfile,Hub-method .db_close recordStatus
> ###   recordStatus,Hub-method length,Hub-method names,Hub-method
> ###   fileName,Hub-method $,Hub-method [[,Hub,character,missing-method
> ###   [[,Hub,numeric,missing-method [,Hub,character,missing-method
> ###   [,Hub,logical,missing-method [,Hub,numeric,missing-method
> ###   [<-,Hub,character,missing,Hub-method
> ###   [<-,Hub,logical,missing,Hub-method [<-,Hub,numeric,missing,Hub-method
> ###   subset,Hub-method query query,Hub-method as.list.Hub
> ###   as.list,Hub-method c,Hub-method show,Hub-method
> ###   show,AnnotationHubResource-method
> ### Keywords: classes methods
> 
> ### ** Examples
> 
>   ## create an AnnotationHub object
>   library(AnnotationHub)
>   ah = AnnotationHub()
snapshotDate(): 2023-03-21
> 
>   ## Summary of available records
>   ah
AnnotationHub with 69798 records
# snapshotDate(): 2023-03-21
# $dataprovider: Ensembl, BroadInstitute, UCSC, ftp://ftp.ncbi.nlm.nih.gov/g...
# $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Bos taurus,...
# $rdataclass: GRanges, TwoBitFile, BigWigFile, EnsDb, Rle, OrgDb, ChainFile...
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH5012"]]' 

             title                                                      
  AH5012   | Chromosome Band                                            
  AH5013   | STS Markers                                                
  AH5014   | FISH Clones                                                
  AH5015   | Recomb Rate                                                
  AH5016   | ENCODE Pilot                                               
  ...        ...                                                        
  AH111330 | Zonotrichia_albicollis.Zonotrichia_albicollis-1.0.1.109.gtf
  AH111331 | Zosterops_lateralis_melanops.ASM128173v1.109.abinitio.gtf  
  AH111332 | Zosterops_lateralis_melanops.ASM128173v1.109.gtf           
  AH111333 | UCSC RepeatMasker annotations (Oct2022) for Human (hg38)   
  AH111334 | MassBank CompDb for release 2022.12.1                      
> 
>   ## Detail for a single record
>   ah[1]
AnnotationHub with 1 record
# snapshotDate(): 2023-03-21
# names(): AH5012
# $dataprovider: UCSC
# $species: Homo sapiens
# $rdataclass: GRanges
# $rdatadateadded: 2013-03-26
# $title: Chromosome Band
# $description: GRanges object from UCSC track 'Chromosome Band'
# $taxonomyid: 9606
# $genome: hg19
# $sourcetype: UCSC track
# $sourceurl: rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/hg19/database...
# $sourcesize: NA
# $tags: c("cytoBand", "UCSC", "track", "Gene", "Transcript",
#   "Annotation") 
# retrieve record with 'object[["AH5012"]]' 
> 
>   ## and what is the date we are using?
>   snapshotDate(ah)
[1] "2023-03-21"
> 
>   ## how many resources?
>   length(ah)
[1] 69798
> 
>   ## from which resources, is data available?
>   head(sort(table(ah$dataprovider), decreasing=TRUE))

                                                                                                   Ensembl 
                                                                                                     34906 
                                                                                            BroadInstitute 
                                                                                                     18248 
                                                                                                      UCSC 
                                                                                                     11193 
                                                                     ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ 
                                                                                                      1871 
                                                                                                  Haemcode 
                                                                                                       945 
FANTOM5,DLRP,IUPHAR,HPRD,STRING,SWISSPROT,TREMBL,ENSEMBL,CELLPHONEDB,BADERLAB,SINGLECELLSIGNALR,HOMOLOGENE 
                                                                                                       501 
> 
>   ## from which species, is data available ?
>   head(sort(table(ah$species),decreasing=TRUE))

           Homo sapiens            Mus musculus Drosophila melanogaster 
                  26554                    1809                     459 
             Bos taurus       Rattus norvegicus         Pan troglodytes 
                    332                     326                     318 
> 
>   ## what web service and local cache does this AnnotationHub point to?
>   hubUrl(ah)
[1] "https://annotationhub.bioconductor.org"
>   hubCache(ah)
[1] "/home/biocbuild/.cache/R/AnnotationHub"
> 
>   ### Examples ###
> 
>   ## One can  search the hub for multiple strings
>   ahs2 <- query(ah, c("GTF", "77","Ensembl", "Homo sapiens"))
> 
>   ## information about the file can be retrieved using
>   ahs2[1]
AnnotationHub with 1 record
# snapshotDate(): 2023-03-21
# names(): AH28812
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: GRanges
# $rdatadateadded: 2015-03-25
# $title: Homo_sapiens.GRCh38.77.gtf
# $description: Gene Annotation for Homo sapiens
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: GTF
# $sourceurl: ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sap...
# $sourcesize: 44454526
# $tags: c("GTF", "ensembl", "Gene", "Transcript", "Annotation") 
# retrieve record with 'object[["AH28812"]]' 
> 
>   ## one can further extract information from this show method
>   ## like the sourceurl using:
>   ahs2$sourceurl
[1] "ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz"
>   ahs2$description
[1] "Gene Annotation for Homo sapiens"
>   ahs2$title
[1] "Homo_sapiens.GRCh38.77.gtf"
> 
>   ## We can download a file by name like this (using a list semantic):
>   gr <- ahs2[[1]]
loading from cache
require(“GenomicRanges”)
Error: failed to load resource
  name: AH28812
  title: Homo_sapiens.GRCh38.77.gtf
  reason: error in evaluating the argument 'x' in selecting a method for function 'get': error reading from connection
Execution halted
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘runTests.R’
 OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ...
  ‘AnnotationHub-HOWTO.Rmd’... OK
  ‘AnnotationHub.Rmd’ using ‘UTF-8’... OK
  ‘TroubleshootingTheCache.Rmd’ using ‘UTF-8’... OK
 NONE
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE

Status: 1 ERROR, 3 WARNINGs
See
  ‘/home/biocbuild/git/AnnotationHub.Rcheck/00check.log’
for details.

Any idea what could be the problem ?

lshep commented 1 year ago

Is GenomicRanges installed? This tries to connect to the Bioconductor Azure Data Lake. There might have been some connectivity issues?

I tried this morning on my local and the resources is available:

> ah =  AnnotationHub()
snapshotDate(): 2023-03-21
> 
> temp = ah[["AH28812"]]
loading from cache
require("GenomicRanges")
> temp
GRanges object with 2672001 ranges and 21 metadata columns:
            seqnames      ranges strand |   source       type     score
               <Rle>   <IRanges>  <Rle> | <factor>   <factor> <numeric>
        [1]        1 11869-14409      + |   havana gene              NA
        [2]        1 11869-14409      + |   havana transcript        NA
        [3]        1 11869-12227      + |   havana exon              NA
        [4]        1 12613-12721      + |   havana exon              NA
        [5]        1 13221-14409      + |   havana exon              NA
        ...      ...         ...    ... .      ...        ...       ...
  [2671997]       MT 15888-15953      + |  ensembl transcript        NA
  [2671998]       MT 15888-15953      + |  ensembl exon              NA
  [2671999]       MT 15956-16023      - |  insdc   gene              NA
  [2672000]       MT 15956-16023      - |  ensembl transcript        NA
  [2672001]       MT 15956-16023      - |  ensembl exon              NA
                phase         gene_id gene_version   gene_name gene_source
            <integer>     <character>    <numeric> <character> <character>
        [1]      <NA> ENSG00000223972            5     DDX11L1      havana
        [2]      <NA> ENSG00000223972            5     DDX11L1      havana
        [3]      <NA> ENSG00000223972            5     DDX11L1      havana
        [4]      <NA> ENSG00000223972            5     DDX11L1      havana
        [5]      <NA> ENSG00000223972            5     DDX11L1      havana
        ...       ...             ...          ...         ...         ...
  [2671997]      <NA> ENSG00000210195            2       MT-TT       insdc
  [2671998]      <NA> ENSG00000210195            2       MT-TT       insdc
  [2671999]      <NA> ENSG00000210196            2       MT-TP       insdc
  [2672000]      <NA> ENSG00000210196            2       MT-TP       insdc
  [2672001]      <NA> ENSG00000210196            2       MT-TP       insdc
                      gene_biotype   transcript_id transcript_version
                       <character>     <character>          <numeric>
        [1] transcribed_unproces..            <NA>                 NA
        [2] transcribed_unproces.. ENST00000456328                  2
        [3] transcribed_unproces.. ENST00000456328                  2
        [4] transcribed_unproces.. ENST00000456328                  2
        [5] transcribed_unproces.. ENST00000456328                  2
        ...                    ...             ...                ...
  [2671997]                Mt_tRNA ENST00000387460                  2
  [2671998]                Mt_tRNA ENST00000387460                  2
  [2671999]                Mt_tRNA            <NA>                 NA
  [2672000]                Mt_tRNA ENST00000387461                  2
  [2672001]                Mt_tRNA ENST00000387461                  2
            transcript_name transcript_source   transcript_biotype exon_number
                <character>       <character>          <character>   <numeric>
        [1]            <NA>              <NA>                 <NA>          NA
        [2]     DDX11L1-002            havana processed_transcript          NA
        [3]     DDX11L1-002            havana processed_transcript           1
        [4]     DDX11L1-002            havana processed_transcript           2
        [5]     DDX11L1-002            havana processed_transcript           3
        ...             ...               ...                  ...         ...
  [2671997]       MT-TT-201           ensembl              Mt_tRNA          NA
  [2671998]       MT-TT-201           ensembl              Mt_tRNA           1
  [2671999]            <NA>              <NA>                 <NA>          NA
  [2672000]       MT-TP-201           ensembl              Mt_tRNA          NA
  [2672001]       MT-TP-201           ensembl              Mt_tRNA           1
                    exon_id exon_version         tag     ccds_id  protein_id
                <character>    <numeric> <character> <character> <character>
        [1]            <NA>           NA        <NA>        <NA>        <NA>
        [2]            <NA>           NA        <NA>        <NA>        <NA>
        [3] ENSE00002234944            1        <NA>        <NA>        <NA>
        [4] ENSE00003582793            1        <NA>        <NA>        <NA>
        [5] ENSE00002312635            1        <NA>        <NA>        <NA>
        ...             ...          ...         ...         ...         ...
  [2671997]            <NA>           NA        <NA>        <NA>        <NA>
  [2671998] ENSE00001544475            2        <NA>        <NA>        <NA>
  [2671999]            <NA>           NA        <NA>        <NA>        <NA>
  [2672000]            <NA>           NA        <NA>        <NA>        <NA>
  [2672001] ENSE00001544473            2        <NA>        <NA>        <NA>
            protein_version
                  <numeric>
        [1]              NA
        [2]              NA
        [3]              NA
        [4]              NA
        [5]              NA
        ...             ...
  [2671997]              NA
  [2671998]              NA
  [2671999]              NA
  [2672000]              NA
  [2672001]              NA
  -------
  seqinfo: 270 sequences (1 circular) from GRCh38 genome
martin-g commented 1 year ago

GenomicRanges_1.51.4 is installed!

What is the url of the Azure Data lake that fails ?

wget ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz works just fine on the command line. But I am not sure whether this is the failing url.

lshep commented 1 year ago

It is not. that is the source url for the initial data and not necessarily the final product. In more recent versions we provide gtf's by accessing and converting the ensembl directly but this seems to predate that.
The API will hit "https://annotationhub.bioconductor.org/fetch/34252" and the actual file retrieved would be "https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData"

martin-g commented 1 year ago

Both urls resolve to the same file and work fine on the test machine:

wget https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData
--2023-03-27 12:01:29--  https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData
Resolving bioconductorhubs.blob.core.windows.net (bioconductorhubs.blob.core.windows.net)... 52.239.247.164, 20.150.32.196, 52.239.247.68
Connecting to bioconductorhubs.blob.core.windows.net (bioconductorhubs.blob.core.windows.net)|52.239.247.164|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20121337 (19M) [binary/octet-stream]
Saving to: ‘Homo_sapiens.GRCh38.77.gtf.RData’

Homo_sapiens.GRCh38.77.gtf.RData                      100%[======================================================================================================================>]  19.19M  1.28MB/s    in 16s     

2023-03-27 12:01:46 (1.19 MB/s) - ‘Homo_sapiens.GRCh38.77.gtf.RData’ saved [20121337/20121337]
wget https://annotationhub.bioconductor.org/fetch/34252
--2023-03-27 12:02:44--  https://annotationhub.bioconductor.org/fetch/34252
Resolving annotationhub.bioconductor.org (annotationhub.bioconductor.org)... 52.73.93.102
Connecting to annotationhub.bioconductor.org (annotationhub.bioconductor.org)|52.73.93.102|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData [following]
--2023-03-27 12:02:45--  https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData
Resolving bioconductorhubs.blob.core.windows.net (bioconductorhubs.blob.core.windows.net)... 52.239.247.164, 52.239.247.68, 20.150.32.196
Connecting to bioconductorhubs.blob.core.windows.net (bioconductorhubs.blob.core.windows.net)|52.239.247.164|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20121337 (19M) [binary/octet-stream]
Saving to: ‘34252’

34252                                                 100%[======================================================================================================================>]  19.19M  1.24MB/s    in 16s     

2023-03-27 12:03:02 (1.19 MB/s) - ‘34252’ saved [20121337/20121337]

I'll try to investigate why the download may fail in R code!

Thank you for the hints, @lshep !

hpages commented 1 year ago

This is unlikely to be a Linux ARM64 specific error. Sounds more like a connectivity issue with your Linux ARM64 builder @martin-g

martin-g commented 1 year ago

It fails consistently when executed via R CMD check, while the wget https://... pass without problems or timeouts.

lshep commented 1 year ago

out of curiosity if you use httr::GET to download does that succeed?

Are you behind any sort of proxy that would need to be set up for download?

martin-g commented 1 year ago

Do you mean lwp-request's GET ? GET https://annotationhub.bioconductor.org/fetch/34252 downloaded it and printed the binary directly in the terminal.

There is no HTTP(S) proxy !

martin-g commented 1 year ago
library(httr)
> r <- GET("https://annotationhub.bioconductor.org/fetch/34252")
> r
Response [https://bioconductorhubs.blob.core.windows.net/annotationhub/ensembl/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.RData]
  Date: 2023-03-27 19:24
  Status: 200
  Content-Type: binary/octet-stream
  Size: 20.1 MB
<BINARY BODY>
martin-g commented 1 year ago

It looks like the local cache was corrupted! I removed /home/biocbuild/.cache/R/AnnotationHub and now the check passed!

Thank you for your help, @lshep & @hpages !