Closed Yunuuuu closed 9 months ago
Another issue related to this, [
for ColBindMatrices
or RowBindMatrices
will also lossing dimnames.
Because rbind2
and cbind2
removed dimnames:
https://github.com/bnprks/BPCells/blob/0d5652425096153e3121f656e48e05e33333a1e3/R/matrix.R#L1046
when subsetting only one matrix, this just return the matrix in the matrix_list
slot (dimnames removed) without restoring back the dimnames.
https://github.com/bnprks/BPCells/blob/0d5652425096153e3121f656e48e05e33333a1e3/R/matrix.R#L1111
I have tried to fix by restoring dimnames but it will cause other issues from my test, .
setMethod("[", "RowBindMatrices", function(x, i, j, ...) {
if (missing(x)) stop("x is missing in matrix selection")
# Handle transpose via recursive call
if (x@transpose) {
return(t(t(x)[rlang::maybe_missing(j), rlang::maybe_missing(i)]))
}
i <- split_selection_index(i, nrow(x), rownames(x))
j <- split_selection_index(j, ncol(x), colnames(x))
# If we're just reordering rows/cols, do a standard matrix selection
if (rlang::is_missing(i$subset) && rlang::is_missing(j$subset)) {
return(callNextMethod(x, unsplit_selection(i), unsplit_selection(j)))
}
# if the length of our row selection is 0, do a standard matrix selection
if (!rlang::is_missing(i$subset) && length(i$subset) == 0) {
return(callNextMethod(x, unsplit_selection(i), unsplit_selection(j)))
}
x <- selection_fix_dims(x, rlang::maybe_missing(i$subset), rlang::maybe_missing(j$subset))
last_row <- 0L
new_mats <- list()
new_dimnames <- list()
rownms <- rownames(x)
colnms <- colnames(x)
for (mat in x@matrix_list) {
row_start <- last_row + 1L
row_end <- last_row + nrow(mat)
if (!rlang::is_missing(i$subset)) {
local_i <- i$subset[i$subset >= row_start & i$subset <= row_end] - last_row
mat <- mat[local_i,]
}
if (!rlang::is_missing(j$subset)) {
# Only pass through the subset operation to a lower-level, not the shuffle
mat <- mat[,j$subset]
}
if (nrow(mat) > 0) {
# rbind2 will remove dimnames in matrix_list,
# we should restore it if return a single matrix
new_dimnames <- c(new_dimnames,
list(list(rownms[row_start:row_end], colnms))
)
new_mats <- c(new_mats, mat)
}
last_row <- row_end
}
if (length(new_mats) > 1) {
x@matrix_list <- new_mats
} else if(length(new_mats) == 1) {
x <- new_mats[[1]]
if (matrix_is_transform(x) && !is(x, "RenameDims")) {
x@dimnames<- new_dimnames[[1L]]
}
} else {
stop("Subset RowBindMatrix error: got 0-length matrix_list after subsetting (please report this BPCells bug)")
}
if (!rlang::is_missing(i$reorder)) {
x <- x[i$reorder,]
}
if (!rlang::is_missing(j$reorder)) {
x <- x[,j$reorder]
}
return(x)
})
By removing lines in: https://github.com/bnprks/BPCells/blob/0d5652425096153e3121f656e48e05e33333a1e3/R/matrix.R#L1046
[
for ColBindMatrices
or RowBindMatrices
can work fine, any reasons to remove dimnames?
If it's okay to removing these lines, I'll make a pull request
The same for [
method of MatrixSubset
object.
wrapMatrix removing dimnames
https://github.com/bnprks/BPCells/blob/0d5652425096153e3121f656e48e05e33333a1e3/R/matrix.R#L30
when subsetting a MatrixSubset
, it reusing the matrix in the matrix slot (dimnames have been removed)
https://github.com/bnprks/BPCells/blob/0d5652425096153e3121f656e48e05e33333a1e3/R/matrix.R#L763
Hi @Yunuuuu, I think I've just changed my mind on how to solve this and will go for some changes that I'll be pushing shortly to main rather than your suggested solution. All your code worked great, but I think I'd prefer to avoid being quite so fancy with the S4 generic dispatches since I find them a bit awkward to step through when I'm debugging. I'm sorry to have pushed a somewhat messy merge into your branch here and then decided to use a different solution. Feel free to do whatever force pushes etc. you need to get it cleaned up. And thanks again for having identified and clearly documented the issue in the first place -- I'll give you a shout-out in the changelog
To answer your question about setting dimnames to null in wrapMatrix
-- the reason is that I've found when saving+loading objects from RDS the copies of dimnames can take up substantial space and memory in the resulting object. This wouldn't be an issue normally since R is pretty smart about sharing storage between identical objects but that seems to be messed up by an rds round-trip. For very large matrices the dimnames can end up requiring a large amount of memory so I wanted to avoid that.
In the long run, I'm hoping to stop storing dimnames in the S4 object and instead implement a dimnames generic that can load dimnames from disk when needed. In the mean time, this does result in bugs popping up that I wouldn't otherwise have to worry about as the cost of avoiding memory duplication issues.
fix https://github.com/bnprks/BPCells/issues/67