Bioconductor / GenomeInfoDb

Utilities for manipulating chromosome names, including modifying them to follow a particular naming style
https://bioconductor.org/packages/GenomeInfoDb
31 stars 13 forks source link

unable to find an inherited method for function ‘seqinfo<-’ for signature ‘"TxDb"’ #12

Closed lima1 closed 4 years ago

lima1 commented 4 years ago

Hi,

I noticed that seqlevelsStyle<- does not work anymore for TxDb objects:

> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> seqlevelsStyle(txdb)
[1] "UCSC"
> seqlevelsStyle(txdb) <- "NCBI"
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘seqinfo<-’ for signature ‘"TxDb"’

Is this functionality permanently removed or a bug?

Thanks, Markus

mtmorgan commented 4 years ago

This is not intended. Are your packages current BiocManager::valid()? What is your sessionInfo()? Perhaps also @hpages will tell us, based on answers to these, whether you should update (from source) particular pacakges.

lima1 commented 4 years ago

I think so, just updated to a fresh 4.0.2 installation after seeing my build failure in devel (one of the two test failures was a bug in my package)

https://bioconductor.org/checkResults/devel/bioc-LATEST/PureCN/malbec1-checksrc.html

BiocManager::valid()

* sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] org.Hs.eg.db_3.11.4                    
 [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [3] GenomicFeatures_1.41.0                 
 [4] AnnotationDbi_1.51.0                   
 [5] PureCN_1.19.3                          
 [6] testthat_2.3.2                         
 [7] VariantAnnotation_1.35.2               
 [8] Rsamtools_2.5.2                        
 [9] Biostrings_2.57.2                      
[10] XVector_0.29.3                         
[11] SummarizedExperiment_1.19.5            
[12] DelayedArray_0.15.5                    
[13] matrixStats_0.56.0                     
[14] Matrix_1.2-18                          
[15] Biobase_2.49.0                         
[16] GenomicRanges_1.41.5                   
[17] GenomeInfoDb_1.25.5                    
[18] IRanges_2.23.10                        
[19] S4Vectors_0.27.12                      
[20] BiocGenerics_0.35.4                    
[21] DNAcopy_1.63.0                         

loaded via a namespace (and not attached):
 [1] VGAM_1.1-3               colorspace_1.4-1         ellipsis_0.3.1          
 [4] rprojroot_1.3-2          futile.logger_1.4.3      fs_1.4.1                
 [7] rstudioapi_0.11          listenv_0.8.0            remotes_2.1.1           
[10] bit64_0.9-7              fansi_0.4.1              codetools_0.2-16        
[13] splines_4.0.2            R.methodsS3_1.8.0        pkgload_1.1.0           
[16] dbplyr_1.4.4             R.oo_1.23.0              BiocManager_1.30.10     
[19] compiler_4.0.2           httr_1.4.1               backports_1.1.8         
[22] assertthat_0.2.1         cli_2.0.2                formatR_1.7             
[25] prettyunits_1.1.1        tools_4.0.2              gtable_0.3.0            
[28] glue_1.4.1               GenomeInfoDbData_1.2.3   dplyr_1.0.0             
[31] rappdirs_0.3.1           Rcpp_1.0.4.6             vctrs_0.3.1             
[34] rhdf5filters_1.1.0       rtracklayer_1.49.3       stringr_1.4.0           
[37] globals_0.12.5           ps_1.3.3                 lifecycle_0.2.0         
[40] devtools_2.3.0           XML_3.99-0.3             future_1.17.0           
[43] zlibbioc_1.35.0          scales_1.1.1             aroma.light_3.19.0      
[46] BSgenome_1.57.1          hms_0.5.3                rhdf5_2.33.3            
[49] lambda.r_1.2.4           RColorBrewer_1.1-2       curl_4.3                
[52] memoise_1.1.0            gridExtra_2.3            ggplot2_3.3.2           
[55] biomaRt_2.45.1           stringi_1.4.6            RSQLite_2.2.0           
[58] desc_1.2.0               PSCBS_0.65.0             pkgbuild_1.0.8          
[61] BiocParallel_1.23.0      rlang_0.4.6              pkgconfig_2.0.3         
[64] bitops_1.0-6             lattice_0.20-41          purrr_0.3.4             
[67] Rhdf5lib_1.11.2          GenomicAlignments_1.25.3 bit_1.1-15.2            
[70] processx_3.4.2           tidyselect_1.1.0         magrittr_1.5            
[73] R6_2.4.1                 generics_0.0.2           DBI_1.1.0               
[76] pillar_1.4.4             withr_2.2.0              RCurl_1.98-1.2          
[79] tibble_3.0.1             crayon_1.3.4             futile.options_1.0.1    
[82] BiocFileCache_1.13.0     progress_1.2.2           usethis_1.6.1           
[85] grid_4.0.2               data.table_1.12.8        blob_1.2.1              
[88] callr_3.4.3              digest_0.6.25            R.cache_0.14.0          
[91] R.utils_2.9.2            openssl_1.4.1            munsell_0.5.0           
[94] sessioninfo_1.1.1        askpass_1.1             

Bioconductor version '3.12'

  * 3 packages out-of-date
  * 1 packages too new

create a valid installation with

  BiocManager::install(c(
    "GenomeInfoDb", "jsonlite", "openssl", "roxygen2"
  ), update = TRUE, ask = FALSE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date
mtmorgan commented 4 years ago

BiocManager suggests that you update GenomeInfoDb (and others, but these don't seem relevant...)

lima1 commented 4 years ago

Sorry for the confusion. This is the "too new" package, I tried both current devel and Github from last night. Both fail with same error.

hpages commented 4 years ago

This is on me. I recently made a few changes to the seqlevelsStyle() setter (commit 11119c2f247264301e07f1d756c3ed29a6ae24c1), but, unfortunately, these changes broke it on TxDb objects. FWIW the same thing is happening with BSgenome objects and I'm in the process of fixing that. Next I'll take care of TxDb objects.

The improvement I've been working on is that the seqlevelsStyle() setter now is able to rename scaffolds and unconventional chromosome names when switching between UCSC and NCBI styles. For example, in the case of TxDb.Hsapiens.UCSC.hg19.knownGene, all the sequences will get renamed, not just the 25 chromosomes. Since GenomeInfoDb 1.25.3, this already works on GRanges objects:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
tx <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene)
tx
# GRanges object with 82960 ranges and 2 metadata columns:
#                 seqnames        ranges strand |     tx_id     tx_name
#                    <Rle>     <IRanges>  <Rle> | <integer> <character>
#       [1]           chr1   11874-14409      + |         1  uc001aaa.3
#       [2]           chr1   11874-14409      + |         2  uc010nxq.1
#       [3]           chr1   11874-14409      + |         3  uc010nxr.1
#       [4]           chr1   69091-70008      + |         4  uc001aal.1
#       [5]           chr1 321084-321115      + |         5  uc001aaq.2
#       ...            ...           ...    ... .       ...         ...
#   [82956] chrUn_gl000237        1-2686      - |     82956  uc011mgu.1
#   [82957] chrUn_gl000241   20433-36875      - |     82957  uc011mgv.2
#   [82958] chrUn_gl000243   11501-11530      + |     82958  uc011mgw.1
#   [82959] chrUn_gl000243   13608-13637      + |     82959  uc022brq.1
#   [82960] chrUn_gl000247     5787-5816      - |     82960  uc022brr.1
#   -------
#   seqinfo: 93 sequences (1 circular) from hg19 genome

seqlevelsStyle(tx) <- "NCBI"
tx
# GRanges object with 82960 ranges and 2 metadata columns:
#                       seqnames        ranges strand |     tx_id     tx_name
#                          <Rle>     <IRanges>  <Rle> | <integer> <character>
#       [1]                    1   11874-14409      + |         1  uc001aaa.3
#       [2]                    1   11874-14409      + |         2  uc010nxq.1
#       [3]                    1   11874-14409      + |         3  uc010nxr.1
#       [4]                    1   69091-70008      + |         4  uc001aal.1
#       [5]                    1 321084-321115      + |         5  uc001aaq.2
#       ...                  ...           ...    ... .       ...         ...
#   [82956] HSCHRUN_RANDOM_CTG30        1-2686      - |     82956  uc011mgu.1
#   [82957] HSCHRUN_RANDOM_CTG34   20433-36875      - |     82957  uc011mgv.2
#   [82958] HSCHRUN_RANDOM_CTG36   11501-11530      + |     82958  uc011mgw.1
#   [82959] HSCHRUN_RANDOM_CTG36   13608-13637      + |     82959  uc022brq.1
#   [82960] HSCHRUN_RANDOM_CTG40     5787-5816      - |     82960  uc022brr.1
#   -------
#   seqinfo: 93 sequences (1 circular) from 2 genomes (GRCh37.p13, hg19)

This will soon work directly on BSgenome and TxDb objects (next week).

Sorry for the temporary inconvenience.

H.

hpages commented 4 years ago

Fixed in GenomicFeatures 1.41.1 (requires GenomeInfoDb 1.25.7):

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

seqinfo(txdb)
# Seqinfo object with 93 sequences (1 circular) from hg19 genome:
#   seqnames       seqlengths isCircular genome
#   chr1            249250621       <NA>   hg19
#   chr2            243199373       <NA>   hg19
#   chr3            198022430       <NA>   hg19
#   chr4            191154276       <NA>   hg19
#   chr5            180915260       <NA>   hg19
#   ...                   ...        ...    ...
#   chrUn_gl000245      36651       <NA>   hg19
#   chrUn_gl000246      38154       <NA>   hg19
#   chrUn_gl000247      36422       <NA>   hg19
#   chrUn_gl000248      39786       <NA>   hg19
#   chrUn_gl000249      38502       <NA>   hg19

seqlevelsStyle(txdb)
# [1] "UCSC"

seqlevelsStyle(txdb) <- "NCBI"  # switch style

All the sequences got renamed except chrM because it does not belong to the GRCh37.p13 assembly (see https://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19):

seqinfo(txdb)
# Seqinfo object with 93 sequences (1 circular) from 2 genomes (GRCh37.p13, hg19):
#   seqnames             seqlengths isCircular     genome
#   1                     249250621       <NA> GRCh37.p13
#   2                     243199373       <NA> GRCh37.p13
#   3                     198022430       <NA> GRCh37.p13
#   4                     191154276       <NA> GRCh37.p13
#   5                     180915260       <NA> GRCh37.p13
#   ...                         ...        ...        ...
#   HSCHRUN_RANDOM_CTG38      36651       <NA> GRCh37.p13
#   HSCHRUN_RANDOM_CTG39      38154       <NA> GRCh37.p13
#   HSCHRUN_RANDOM_CTG40      36422       <NA> GRCh37.p13
#   HSCHRUN_RANDOM_CTG41      39786       <NA> GRCh37.p13
#   HSCHRUN_RANDOM_CTG42      38502       <NA> GRCh37.p13

table(genome(txdb))
# GRCh37.p13       hg19 
#         92          1 

H.

lima1 commented 4 years ago

Thanks a lot, that’s great!