Bioconductor / GenomicRanges

Representation and manipulation of genomic intervals
https://bioconductor.org/packages/GenomicRanges
41 stars 17 forks source link

In v1.46.1 seqnames changed when added to GRangesList #65

Closed peterch405 closed 2 years ago

peterch405 commented 2 years ago

I recently updated from 1.42.0 to 1.46.1 and now my code seems to produce incorrect results. I have traced it back to adding items to a GRangesList during a for loop. For some reason the seqnames are changed at the end of the loop.

Reproducible example:

rmchr <- function(gr){
  seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
  return(gr)
}

assembly_info <- GenomeInfoDb::getChromInfoFromUCSC("hg19", assembled.molecules.only=TRUE, as.Seqinfo=TRUE)
assembly_info <- rmchr(assembly_info)
assembly_info <- assembly_info[c(seq(1,22), "MT","X")]

breaks.all.chroms <- GenomicRanges::GRangesList()

GenomeInfoDb::seqlevels(breaks.all.chroms) <- GenomeInfoDb::seqlevels(assembly_info)
GenomeInfoDb::seqlengths(breaks.all.chroms) <- GenomeInfoDb::seqlengths(assembly_info)

r1 <- GRanges(seqnames = 2, ranges = IRanges(start = 120740313, end = 120915597), seqlengths = c("2"=243199373))
r2 <- GRanges(seqnames = 13, ranges = IRanges(start = 36277904, end = 37489152), seqlengths = c("13"=115169878))

rlist <- list("2"=r1, "13"=r2)

for(i in c("2", "13")){
  breaks.all.chroms[[i]] <- rlist[[i]]

}

Results:


> breaks.all.chroms
GRangesList object of length 2:
$`2`
GRanges object with 1 range and 0 metadata columns:
      seqnames              ranges strand
         <Rle>           <IRanges>  <Rle>
  [1]        1 120740313-120915597      *
  -------
  seqinfo: 24 sequences from an unspecified genome

$`13`
GRanges object with 1 range and 0 metadata columns:
      seqnames            ranges strand
         <Rle>         <IRanges>  <Rle>
  [1]       13 36277904-37489152      *
  -------
  seqinfo: 24 sequences from an unspecified genome

The first GRanges should have seqnames 2

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1  IRanges_2.28.0       S4Vectors_0.32.3     BiocGenerics_0.40.0 

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3                            rtracklayer_1.54.0                        ModelMetrics_1.2.2.2                      R.methodsS3_1.8.1                        
  [5] tidyr_1.2.0                               knitr_1.37                                ggplot2_3.3.5                             bit64_4.0.5                              
  [9] DelayedArray_0.20.0                       R.utils_2.11.0                            data.table_1.14.2                         rpart_4.1.16                             
 [13] hwriter_1.3.2                             KEGGREST_1.34.0                           hardhat_0.2.0                             RCurl_1.98-1.6                           
 [17] doParallel_1.0.17                         generics_0.1.2                            snow_0.4-4                                GenomicFeatures_1.46.5                   
 [21] org.Mm.eg.db_3.14.0                       preprocessCore_1.56.0                     cowplot_1.1.1                             EnrichedHeatmap_1.24.0                   
 [25] RSQLite_2.2.10                            shadowtext_0.1.1                          proxy_0.4-26                              future_1.24.0                            
 [29] ggpointdensity_0.1.0                      tzdb_0.2.0                                bit_4.0.4                                 enrichplot_1.14.2                        
 [33] xml2_1.3.3                                lubridate_1.8.0                           SummarizedExperiment_1.24.0               assertthat_0.2.1                         
 [37] profileplyr_1.10.2                        viridis_0.6.2                             xfun_0.30                                 gower_1.0.0                              
 [41] hms_1.1.1                                 evaluate_0.15                             fansi_1.0.2                               restfulr_0.0.13                          
 [45] progress_1.2.2                            caTools_1.18.2                            dbplyr_2.1.1                              breakpointR_1.12.0                       
 [49] igraph_1.2.11                             DBI_1.1.2                                 purrr_0.3.4                               ellipsis_0.3.2                           
 [53] rGREAT_1.26.0                             dplyr_1.0.8                               biomaRt_2.50.3                            MatrixGenerics_1.6.0                     
 [57] vctrs_0.3.8                               Biobase_2.54.0                            caret_6.0-91                              cachem_1.0.6                             
 [61] withr_2.5.0                               ggforce_0.3.3                             GenomicAlignments_1.30.0                  treeio_1.18.1                            
 [65] prettyunits_1.1.1                         TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 cluster_2.1.2                             DOSE_3.20.1                              
 [69] ape_5.6-2                                 lazyeval_0.2.2                            crayon_1.5.0                              recipes_0.2.0                            
 [73] pkgconfig_2.0.3                           tweenr_1.0.2                              nlme_3.1-155                              nnet_7.3-17                              
 [77] rlang_1.0.2                               globals_0.14.0                            lifecycle_1.0.1                           filelock_1.0.2                           
 [81] BiocFileCache_2.2.1                       doSNOW_1.0.20                             polyclip_1.10-0                           matrixStats_0.61.0                       
 [85] tiff_0.1-11                               Matrix_1.4-0                              aplot_0.1.2                               chipseq_1.44.0                           
 [89] boot_1.3-28                               GlobalOptions_0.1.2                       pheatmap_1.0.12                           png_0.1-7                                
 [93] viridisLite_0.4.0                         rjson_0.2.21                              bitops_1.0-7                              R.oo_1.24.0                              
 [97] KernSmooth_2.23-20                        pROC_1.18.0                               Biostrings_2.62.0                         blob_1.2.2                               
[101] shape_1.4.6                               stringr_1.4.0                             qvalue_2.26.0                             ShortRead_1.52.0                         
[105] parallelly_1.30.0                         readr_2.1.2                               jpeg_0.1-9                                gridGraphics_0.5-1                       
[109] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2   scales_1.1.1                              memoise_2.0.1                             magrittr_2.0.2                           
[113] plyr_1.8.6                                gplots_3.1.1                              zlibbioc_1.40.0                           compiler_4.1.3                           
[117] scatterpie_0.1.7                          TxDb.Hsapiens.UCSC.hg38.knownGene_3.14.0  BiocIO_1.4.0                              RColorBrewer_1.1-2                       
[121] plotrix_3.8-2                             clue_0.3-60                               Rsamtools_2.10.0                          cli_3.2.0                                
[125] XVector_0.34.0                            listenv_0.8.0                             patchwork_1.1.1                           MASS_7.3-55                              
[129] tidyselect_1.1.2                          stringi_1.7.6                             sciStrandR_0.1.0                          yaml_2.3.5                               
[133] GOSemSim_2.20.0                           locfit_1.5-9.5                            latticeExtra_0.6-29                       ggrepel_0.9.1                            
[137] grid_4.1.3                                polynom_1.4-0                             fastmatch_1.1-3                           tools_4.1.3                              
[141] future.apply_1.8.1                        parallel_4.1.3                            circlize_0.4.14                           rstudioapi_0.13                          
[145] foreach_1.5.2                             TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2   gridExtra_2.3                             prodlim_2019.11.13                       
[149] farver_2.1.0                              ggraph_2.0.5                              digest_0.6.29                             BiocManager_1.30.16                      
[153] lava_1.6.10                               Rcpp_1.0.8.3                              org.Hs.eg.db_3.14.0                       httr_1.4.2                               
[157] ggdendro_0.1.23                           AnnotationDbi_1.56.2                      ComplexHeatmap_2.10.0                     colorspace_2.0-3                         
[161] XML_3.99-0.9                              splines_4.1.3                             yulab.utils_0.0.4                         tidytree_0.3.9                           
[165] graphlayouts_0.8.0                        ggplotify_0.1.0                           preseqR_4.0.0                             jsonlite_1.8.0                           
[169] ggtree_3.2.1                              soGGi_1.26.0                              tidygraph_1.2.0                           timeDate_3043.102                        
[173] ggfun_0.0.5                               ipred_0.9-12                              R6_2.5.1                                  breakpointRdata_1.12.0                   
[177] htmltools_0.5.2                           pillar_1.7.0                              glue_1.6.2                                fastmap_1.1.0                            
[181] BiocParallel_1.28.3                       class_7.3-20                              codetools_0.2-18                          ChIPseeker_1.30.3                        
[185] fgsea_1.20.0                              utf8_1.2.2                                lattice_0.20-45                           tibble_3.1.6                             
[189] curl_4.3.2                                gtools_3.9.2                              GO.db_3.14.0                              survival_3.3-1                           
[193] rmarkdown_2.13                            munsell_0.5.0                             e1071_1.7-9                               DO.db_2.9                                
[197] GetoptLong_1.0.5                          GenomeInfoDbData_1.2.7                    iterators_1.0.14                          reshape2_1.4.4                           
[201] gtable_0.3.0
PeteHaitch commented 2 years ago

Simplified reprex

suppressPackageStartupMessages(library(GenomicRanges))

x <- GRangesList()
seqlevels(x) <- as.character(c(1:22))
seqlengths(x) <- rep(1000, 22)

for (i in c("2", "13")) {
  x[[i]] <- GRanges(i, IRanges(1, 100))
  # Everything looks okay here
  print(x[[i]])
}
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]        2     1-100      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]       13     1-100      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome

# But not okay here
x
#> GRangesList object of length 2:
#> $`2`
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]        1     1-100      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome
#> 
#> $`13`
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]       13     1-100      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome

Created on 2022-03-22 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.2 (2021-11-01) #> os Ubuntu 20.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_AU:en #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2022-03-22 #> pandoc 2.17.1.1 @ /usr/lib/rstudio/bin/quarto/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> BiocGenerics * 0.40.0 2021-10-26 [1] RSPM (R 4.1.2) #> bitops 1.0-7 2021-04-24 [1] RSPM (R 4.1.0) #> cli 3.2.0 2022-02-14 [1] RSPM (R 4.1.2) #> digest 0.6.29 2021-12-01 [1] RSPM (R 4.1.2) #> evaluate 0.15 2022-02-18 [1] RSPM (R 4.1.0) #> fastmap 1.1.0 2021-01-25 [1] RSPM (R 4.1.0) #> fs 1.5.2 2021-12-08 [1] RSPM (R 4.1.2) #> GenomeInfoDb * 1.30.1 2022-01-30 [1] RSPM (R 4.1.2) #> GenomeInfoDbData 1.2.7 2021-10-28 [1] RSPM (R 4.1.1) #> GenomicRanges * 1.46.1 2021-11-18 [1] RSPM (R 4.1.2) #> glue 1.6.2 2022-02-24 [1] RSPM (R 4.1.2) #> highr 0.9 2021-04-16 [1] RSPM (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] RSPM (R 4.1.0) #> IRanges * 2.28.0 2021-10-26 [1] RSPM (R 4.1.2) #> knitr 1.37 2021-12-16 [1] RSPM (R 4.1.0) #> magrittr 2.0.2 2022-01-26 [1] RSPM (R 4.1.2) #> RCurl 1.98-1.6 2022-02-08 [1] RSPM (R 4.1.2) #> reprex 2.0.1 2021-08-05 [1] RSPM (R 4.1.0) #> rlang 1.0.2 2022-03-04 [1] RSPM (R 4.1.2) #> rmarkdown 2.13 2022-03-10 [1] RSPM (R 4.1.2) #> rstudioapi 0.13 2020-11-12 [1] RSPM (R 4.1.0) #> S4Vectors * 0.32.3 2021-11-21 [1] RSPM (R 4.1.2) #> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.1.2) #> stringi 1.7.6 2021-11-29 [1] RSPM (R 4.1.2) #> stringr 1.4.0 2019-02-10 [1] RSPM (R 4.1.0) #> withr 2.5.0 2022-03-03 [1] RSPM (R 4.1.2) #> xfun 0.30 2022-03-02 [1] RSPM (R 4.1.2) #> XVector 0.34.0 2021-10-26 [1] RSPM (R 4.1.2) #> yaml 2.3.5 2022-02-21 [1] RSPM (R 4.1.0) #> zlibbioc 1.40.0 2021-10-26 [1] RSPM (R 4.1.2) #> #> [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.1 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
PeteHaitch commented 2 years ago

And the same example but looking at the internals with str(); note that on first iteration @seqnames has only 1 level but that changes after the loop completes.

suppressPackageStartupMessages(library(GenomicRanges))

x <- GRangesList()
seqlevels(x) <- as.character(c(1:22))
seqlengths(x) <- rep(1000, 22)

for (i in c("2", "13")) {
  x[[i]] <- GRanges(i, IRanges(1, 100))
  # Note that on first iteration @seqnames has only 1 level.
  str(x[[i]])
}
#> Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#>   ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. ..@ values         : Factor w/ 1 level "2": 1
#>   .. .. ..@ lengths        : int 1
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
#>   .. .. ..@ start          : int 1
#>   .. .. ..@ width          : int 100
#>   .. .. ..@ NAMES          : NULL
#>   .. .. ..@ elementType    : chr "ANY"
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
#>   .. .. ..@ lengths        : int 1
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#>   .. .. ..@ seqnames   : chr [1:22] "1" "2" "3" "4" ...
#>   .. .. ..@ seqlengths : int [1:22] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
#>   .. .. ..@ is_circular: logi [1:22] NA NA NA NA NA NA ...
#>   .. .. ..@ genome     : chr [1:22] NA NA NA NA ...
#>   ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. ..@ rownames       : NULL
#>   .. .. ..@ nrows          : int 1
#>   .. .. ..@ listData       : Named list()
#>   .. .. ..@ elementType    : chr "ANY"
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ elementType    : chr "ANY"
#>   ..@ metadata       : list()
#> Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#>   ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. ..@ values         : Factor w/ 22 levels "1","2","3","4",..: 13
#>   .. .. ..@ lengths        : int 1
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
#>   .. .. ..@ start          : int 1
#>   .. .. ..@ width          : int 100
#>   .. .. ..@ NAMES          : NULL
#>   .. .. ..@ elementType    : chr "ANY"
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
#>   .. .. ..@ lengths        : int 1
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#>   .. .. ..@ seqnames   : chr [1:22] "1" "2" "3" "4" ...
#>   .. .. ..@ seqlengths : int [1:22] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
#>   .. .. ..@ is_circular: logi [1:22] NA NA NA NA NA NA ...
#>   .. .. ..@ genome     : chr [1:22] NA NA NA NA ...
#>   ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. ..@ rownames       : NULL
#>   .. .. ..@ nrows          : int 1
#>   .. .. ..@ listData       : Named list()
#>   .. .. ..@ elementType    : chr "ANY"
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ elementType    : chr "ANY"
#>   ..@ metadata       : list()

# But that changes after loop is completed
str(x[["2"]])
#> Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#>   ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. ..@ values         : Factor w/ 22 levels "1","2","3","4",..: 1
#>   .. .. ..@ lengths        : int 1
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
#>   .. .. ..@ start          : int 1
#>   .. .. ..@ width          : int 100
#>   .. .. ..@ NAMES          : NULL
#>   .. .. ..@ elementType    : chr "ANY"
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
#>   .. .. ..@ lengths        : int 1
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#>   .. .. ..@ seqnames   : chr [1:22] "1" "2" "3" "4" ...
#>   .. .. ..@ seqlengths : int [1:22] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
#>   .. .. ..@ is_circular: logi [1:22] NA NA NA NA NA NA ...
#>   .. .. ..@ genome     : chr [1:22] NA NA NA NA ...
#>   ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. ..@ rownames       : NULL
#>   .. .. ..@ nrows          : int 1
#>   .. .. ..@ listData       : Named list()
#>   .. .. ..@ elementType    : chr "ANY"
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ elementType    : chr "ANY"
#>   ..@ metadata       : list()
str(x[["13"]])
#> Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#>   ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. ..@ values         : Factor w/ 22 levels "1","2","3","4",..: 13
#>   .. .. ..@ lengths        : int 1
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
#>   .. .. ..@ start          : int 1
#>   .. .. ..@ width          : int 100
#>   .. .. ..@ NAMES          : NULL
#>   .. .. ..@ elementType    : chr "ANY"
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 3
#>   .. .. ..@ lengths        : int 1
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#>   .. .. ..@ seqnames   : chr [1:22] "1" "2" "3" "4" ...
#>   .. .. ..@ seqlengths : int [1:22] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
#>   .. .. ..@ is_circular: logi [1:22] NA NA NA NA NA NA ...
#>   .. .. ..@ genome     : chr [1:22] NA NA NA NA ...
#>   ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. ..@ rownames       : NULL
#>   .. .. ..@ nrows          : int 1
#>   .. .. ..@ listData       : Named list()
#>   .. .. ..@ elementType    : chr "ANY"
#>   .. .. ..@ elementMetadata: NULL
#>   .. .. ..@ metadata       : list()
#>   ..@ elementType    : chr "ANY"
#>   ..@ metadata       : list()

Created on 2022-03-22 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.2 (2021-11-01) #> os Ubuntu 20.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_AU:en #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Melbourne #> date 2022-03-22 #> pandoc 2.17.1.1 @ /usr/lib/rstudio/bin/quarto/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> BiocGenerics * 0.40.0 2021-10-26 [1] RSPM (R 4.1.2) #> bitops 1.0-7 2021-04-24 [1] RSPM (R 4.1.0) #> cli 3.2.0 2022-02-14 [1] RSPM (R 4.1.2) #> digest 0.6.29 2021-12-01 [1] RSPM (R 4.1.2) #> evaluate 0.15 2022-02-18 [1] RSPM (R 4.1.0) #> fastmap 1.1.0 2021-01-25 [1] RSPM (R 4.1.0) #> fs 1.5.2 2021-12-08 [1] RSPM (R 4.1.2) #> GenomeInfoDb * 1.30.1 2022-01-30 [1] RSPM (R 4.1.2) #> GenomeInfoDbData 1.2.7 2021-10-28 [1] RSPM (R 4.1.1) #> GenomicRanges * 1.46.1 2021-11-18 [1] RSPM (R 4.1.2) #> glue 1.6.2 2022-02-24 [1] RSPM (R 4.1.2) #> highr 0.9 2021-04-16 [1] RSPM (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] RSPM (R 4.1.0) #> IRanges * 2.28.0 2021-10-26 [1] RSPM (R 4.1.2) #> knitr 1.37 2021-12-16 [1] RSPM (R 4.1.0) #> magrittr 2.0.2 2022-01-26 [1] RSPM (R 4.1.2) #> RCurl 1.98-1.6 2022-02-08 [1] RSPM (R 4.1.2) #> reprex 2.0.1 2021-08-05 [1] RSPM (R 4.1.0) #> rlang 1.0.2 2022-03-04 [1] RSPM (R 4.1.2) #> rmarkdown 2.13 2022-03-10 [1] RSPM (R 4.1.2) #> rstudioapi 0.13 2020-11-12 [1] RSPM (R 4.1.0) #> S4Vectors * 0.32.3 2021-11-21 [1] RSPM (R 4.1.2) #> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.1.2) #> stringi 1.7.6 2021-11-29 [1] RSPM (R 4.1.2) #> stringr 1.4.0 2019-02-10 [1] RSPM (R 4.1.0) #> withr 2.5.0 2022-03-03 [1] RSPM (R 4.1.2) #> xfun 0.30 2022-03-02 [1] RSPM (R 4.1.2) #> XVector 0.34.0 2021-10-26 [1] RSPM (R 4.1.2) #> yaml 2.3.5 2022-02-21 [1] RSPM (R 4.1.0) #> zlibbioc 1.40.0 2021-10-26 [1] RSPM (R 4.1.2) #> #> [1] /home/peter/R/x86_64-pc-linux-gnu-library/4.1 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
PeteHaitch commented 2 years ago

The above is all using GenomicRanges v1.46.1. I could quickly check v1.42.0 and can confirm that this behaviour doesn't occur there.

suppressPackageStartupMessages(library(GenomicRanges))
#> Warning: package 'BiocGenerics' was built under R version 4.0.5
#> Warning: package 'GenomeInfoDb' was built under R version 4.0.5

x <- GRangesList()
seqlevels(x) <- as.character(c(1:22))
seqlengths(x) <- rep(1000, 22)

for (i in c("2", "13")) {
  x[[i]] <- GRanges(i, IRanges(1, 100))
  # Everything looks okay here
  print(x[[i]])
}
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]        2     1-100      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]       13     1-100      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome

# And okay here
x
#> GRangesList object of length 2:
#> $`2`
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]        2     1-100      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome
#> 
#> $`13`
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]       13     1-100      *
#>   -------
#>   seqinfo: 22 sequences from an unspecified genome

Created on 2022-03-22 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 (2020-10-10) #> os CentOS Linux 7 (Core) #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Melbourne #> date 2022-03-22 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> BiocGenerics * 0.36.1 2021-04-16 [1] Bioconductor #> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.0.5) #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.5) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> GenomeInfoDb * 1.26.7 2021-04-08 [1] Bioconductor #> GenomeInfoDbData 1.2.4 2020-10-28 [1] Bioconductor #> GenomicRanges * 1.42.0 2020-10-27 [1] Bioconductor #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.5) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3) #> IRanges * 2.24.1 2020-12-12 [1] Bioconductor #> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> RCurl 1.98-1.2 2020-04-18 [1] CRAN (R 4.0.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.0.3) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.5) #> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.0.5) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3) #> S4Vectors * 0.28.1 2020-12-09 [1] Bioconductor #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.6.2 2021-05-17 [1] CRAN (R 4.0.5) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.5) #> xfun 0.23 2021-05-15 [1] CRAN (R 4.0.5) #> XVector 0.30.0 2020-10-27 [1] Bioconductor #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> zlibbioc 1.36.0 2020-10-27 [1] Bioconductor #> #> [1] /stornext/Home/data/allstaff/h/hickey/R/x86_64-pc-linux-gnu-library/4.0 #> [2] /stornext/System/data/apps/R/R-4.0.3/lib64/R/library ```

I'm not yet sure what is causing this but hopefully this can help track it down.

peterch405 commented 2 years ago

I downgraded to v1.44.0 and the behavior is not there either.

LiNk-NY commented 2 years ago

FWIW, this also happens when concatenating a GRanges to a CGRL :

> x[1]
GRangesList object of length 1:
$`2`
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        2     1-100      *
  -------
  seqinfo: 22 sequences from an unspecified genome
> c(x[1], GRanges("13", IRanges(1, 100)))
GRangesList object of length 2:
$`2`
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        1     1-100      *
  -------
  seqinfo: 22 sequences from an unspecified genome

[[2]]
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]       13     1-100      *
  -------
  seqinfo: 22 sequences from an unspecified genome
hpages commented 2 years ago

... or when concatenating these 2 GRanges objects:

x <- GRanges(seqinfo=Seqinfo(paste0("chr", 1:5), 1001:1005))
c(x, GRanges("chr4:1-11"))
# Error in validObject(ans) : invalid class “GRanges” object: 
#     'seqlevels(seqinfo(x))' and 'levels(seqnames(x))' are not identical

in which case I get an error (in release with GenomicRanges 1.46.1 + S4Vectors 0.32.3 and in devel with GenomicRanges 1.47.6 + S4Vectors 0.33.12).

I think it's related to the problems reported above. Looks like the various incorrect GRangesList objects that each of you got with their MREs fail to pass validObject(grl, complete=TRUE).

Taking a closer look now...

LiNk-NY commented 2 years ago

I can confirm it is not a validObject with complete = TRUE:

> x <- GRangesList()
> seqlevels(x) <- as.character(c(1:22))
> seqlengths(x) <- rep(1000, 22)
> x[["2"]] <- GRanges("2", IRanges(1, 100))
> validObject(x, complete = TRUE)
Error in validObject(x, complete = TRUE) : 
  invalid class "CompressedGRangesList" object: In slot "unlistData" of class "GRanges": 
    'seqlevels(seqinfo(x))' and 'levels(seqnames(x))' are not identical
hpages commented 2 years ago

Fixed in S4Vectors 0.32.4 (https://github.com/Bioconductor/S4Vectors/commit/6703ee891a678e8ae474bb5c8a5dbdd49b67b9bf) and S4Vectors 0.33.13 (https://github.com/Bioconductor/S4Vectors/commit/bba09748db55bccf3d62bdb66a53a1f86074141f).

Darn, I introduced this nasty regression in November in release and devel!

While working on this, I ran into:

setClass("A", slots=c(stuff="ANY"))
x <- new("A", stuff=11:14)
y <- `slot<-`(x, "stuff", value=99)

y
# An object of class "A"
# Slot "stuff":
# [1] 99

x
# An object of class "A"
# Slot "stuff":
# [1] 99 

Ouch!

PeteHaitch commented 2 years ago

Thanks, Hervé! And yikes to that example