Closed peterch405 closed 2 years ago
Simplified reprex
suppressPackageStartupMessages(library(GenomicRanges))
x <- GRangesList()
seqlevels(x) <- as.character(c(1:22))
seqlengths(x) <- rep(1000, 22)
for (i in c("2", "13")) {
x[[i]] <- GRanges(i, IRanges(1, 100))
# Everything looks okay here
print(x[[i]])
}
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 2 1-100 *
#> -------
#> seqinfo: 22 sequences from an unspecified genome
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 13 1-100 *
#> -------
#> seqinfo: 22 sequences from an unspecified genome
# But not okay here
x
#> GRangesList object of length 2:
#> $`2`
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 1 1-100 *
#> -------
#> seqinfo: 22 sequences from an unspecified genome
#>
#> $`13`
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 13 1-100 *
#> -------
#> seqinfo: 22 sequences from an unspecified genome
Created on 2022-03-22 by the reprex package (v2.0.1)
And the same example but looking at the internals with str()
; note that on first iteration @seqnames
has only 1 level but that changes after the loop completes.
suppressPackageStartupMessages(library(GenomicRanges))
x <- GRangesList()
seqlevels(x) <- as.character(c(1:22))
seqlengths(x) <- rep(1000, 22)
for (i in c("2", "13")) {
x[[i]] <- GRanges(i, IRanges(1, 100))
# Note that on first iteration @seqnames has only 1 level.
str(x[[i]])
}
#> Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#> ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#> .. .. ..@ values : Factor w/ 1 level "2": 1
#> .. .. ..@ lengths : int 1
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots
#> .. .. ..@ start : int 1
#> .. .. ..@ width : int 100
#> .. .. ..@ NAMES : NULL
#> .. .. ..@ elementType : chr "ANY"
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#> .. .. ..@ values : Factor w/ 3 levels "+","-","*": 3
#> .. .. ..@ lengths : int 1
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#> .. .. ..@ seqnames : chr [1:22] "1" "2" "3" "4" ...
#> .. .. ..@ seqlengths : int [1:22] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
#> .. .. ..@ is_circular: logi [1:22] NA NA NA NA NA NA ...
#> .. .. ..@ genome : chr [1:22] NA NA NA NA ...
#> ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#> .. .. ..@ rownames : NULL
#> .. .. ..@ nrows : int 1
#> .. .. ..@ listData : Named list()
#> .. .. ..@ elementType : chr "ANY"
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ elementType : chr "ANY"
#> ..@ metadata : list()
#> Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#> ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#> .. .. ..@ values : Factor w/ 22 levels "1","2","3","4",..: 13
#> .. .. ..@ lengths : int 1
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots
#> .. .. ..@ start : int 1
#> .. .. ..@ width : int 100
#> .. .. ..@ NAMES : NULL
#> .. .. ..@ elementType : chr "ANY"
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#> .. .. ..@ values : Factor w/ 3 levels "+","-","*": 3
#> .. .. ..@ lengths : int 1
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#> .. .. ..@ seqnames : chr [1:22] "1" "2" "3" "4" ...
#> .. .. ..@ seqlengths : int [1:22] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
#> .. .. ..@ is_circular: logi [1:22] NA NA NA NA NA NA ...
#> .. .. ..@ genome : chr [1:22] NA NA NA NA ...
#> ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#> .. .. ..@ rownames : NULL
#> .. .. ..@ nrows : int 1
#> .. .. ..@ listData : Named list()
#> .. .. ..@ elementType : chr "ANY"
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ elementType : chr "ANY"
#> ..@ metadata : list()
# But that changes after loop is completed
str(x[["2"]])
#> Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#> ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#> .. .. ..@ values : Factor w/ 22 levels "1","2","3","4",..: 1
#> .. .. ..@ lengths : int 1
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots
#> .. .. ..@ start : int 1
#> .. .. ..@ width : int 100
#> .. .. ..@ NAMES : NULL
#> .. .. ..@ elementType : chr "ANY"
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#> .. .. ..@ values : Factor w/ 3 levels "+","-","*": 3
#> .. .. ..@ lengths : int 1
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#> .. .. ..@ seqnames : chr [1:22] "1" "2" "3" "4" ...
#> .. .. ..@ seqlengths : int [1:22] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
#> .. .. ..@ is_circular: logi [1:22] NA NA NA NA NA NA ...
#> .. .. ..@ genome : chr [1:22] NA NA NA NA ...
#> ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#> .. .. ..@ rownames : NULL
#> .. .. ..@ nrows : int 1
#> .. .. ..@ listData : Named list()
#> .. .. ..@ elementType : chr "ANY"
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ elementType : chr "ANY"
#> ..@ metadata : list()
str(x[["13"]])
#> Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#> ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#> .. .. ..@ values : Factor w/ 22 levels "1","2","3","4",..: 13
#> .. .. ..@ lengths : int 1
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots
#> .. .. ..@ start : int 1
#> .. .. ..@ width : int 100
#> .. .. ..@ NAMES : NULL
#> .. .. ..@ elementType : chr "ANY"
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#> .. .. ..@ values : Factor w/ 3 levels "+","-","*": 3
#> .. .. ..@ lengths : int 1
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#> .. .. ..@ seqnames : chr [1:22] "1" "2" "3" "4" ...
#> .. .. ..@ seqlengths : int [1:22] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
#> .. .. ..@ is_circular: logi [1:22] NA NA NA NA NA NA ...
#> .. .. ..@ genome : chr [1:22] NA NA NA NA ...
#> ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#> .. .. ..@ rownames : NULL
#> .. .. ..@ nrows : int 1
#> .. .. ..@ listData : Named list()
#> .. .. ..@ elementType : chr "ANY"
#> .. .. ..@ elementMetadata: NULL
#> .. .. ..@ metadata : list()
#> ..@ elementType : chr "ANY"
#> ..@ metadata : list()
Created on 2022-03-22 by the reprex package (v2.0.1)
The above is all using GenomicRanges v1.46.1
.
I could quickly check v1.42.0
and can confirm that this behaviour doesn't occur there.
suppressPackageStartupMessages(library(GenomicRanges))
#> Warning: package 'BiocGenerics' was built under R version 4.0.5
#> Warning: package 'GenomeInfoDb' was built under R version 4.0.5
x <- GRangesList()
seqlevels(x) <- as.character(c(1:22))
seqlengths(x) <- rep(1000, 22)
for (i in c("2", "13")) {
x[[i]] <- GRanges(i, IRanges(1, 100))
# Everything looks okay here
print(x[[i]])
}
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 2 1-100 *
#> -------
#> seqinfo: 22 sequences from an unspecified genome
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 13 1-100 *
#> -------
#> seqinfo: 22 sequences from an unspecified genome
# And okay here
x
#> GRangesList object of length 2:
#> $`2`
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 2 1-100 *
#> -------
#> seqinfo: 22 sequences from an unspecified genome
#>
#> $`13`
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 13 1-100 *
#> -------
#> seqinfo: 22 sequences from an unspecified genome
Created on 2022-03-22 by the reprex package (v2.0.1)
I'm not yet sure what is causing this but hopefully this can help track it down.
I downgraded to v1.44.0
and the behavior is not there either.
FWIW, this also happens when concatenating a GRanges
to a CGRL :
> x[1]
GRangesList object of length 1:
$`2`
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 2 1-100 *
-------
seqinfo: 22 sequences from an unspecified genome
> c(x[1], GRanges("13", IRanges(1, 100)))
GRangesList object of length 2:
$`2`
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 1-100 *
-------
seqinfo: 22 sequences from an unspecified genome
[[2]]
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 13 1-100 *
-------
seqinfo: 22 sequences from an unspecified genome
... or when concatenating these 2 GRanges objects:
x <- GRanges(seqinfo=Seqinfo(paste0("chr", 1:5), 1001:1005))
c(x, GRanges("chr4:1-11"))
# Error in validObject(ans) : invalid class “GRanges” object:
# 'seqlevels(seqinfo(x))' and 'levels(seqnames(x))' are not identical
in which case I get an error (in release with GenomicRanges 1.46.1 + S4Vectors 0.32.3 and in devel with GenomicRanges 1.47.6 + S4Vectors 0.33.12).
I think it's related to the problems reported above. Looks like the various incorrect GRangesList objects that each of you got with their MREs fail to pass validObject(grl, complete=TRUE)
.
Taking a closer look now...
I can confirm it is not a validObject
with complete = TRUE
:
> x <- GRangesList()
> seqlevels(x) <- as.character(c(1:22))
> seqlengths(x) <- rep(1000, 22)
> x[["2"]] <- GRanges("2", IRanges(1, 100))
> validObject(x, complete = TRUE)
Error in validObject(x, complete = TRUE) :
invalid class "CompressedGRangesList" object: In slot "unlistData" of class "GRanges":
'seqlevels(seqinfo(x))' and 'levels(seqnames(x))' are not identical
Fixed in S4Vectors 0.32.4 (https://github.com/Bioconductor/S4Vectors/commit/6703ee891a678e8ae474bb5c8a5dbdd49b67b9bf) and S4Vectors 0.33.13 (https://github.com/Bioconductor/S4Vectors/commit/bba09748db55bccf3d62bdb66a53a1f86074141f).
Darn, I introduced this nasty regression in November in release and devel!
While working on this, I ran into:
setClass("A", slots=c(stuff="ANY"))
x <- new("A", stuff=11:14)
y <- `slot<-`(x, "stuff", value=99)
y
# An object of class "A"
# Slot "stuff":
# [1] 99
x
# An object of class "A"
# Slot "stuff":
# [1] 99
Ouch!
Thanks, Hervé! And yikes to that example
I recently updated from 1.42.0 to 1.46.1 and now my code seems to produce incorrect results. I have traced it back to adding items to a GRangesList during a for loop. For some reason the seqnames are changed at the end of the loop.
Reproducible example:
Results:
The first GRanges should have seqnames 2