Closed Yunuuuu closed 9 months ago
And the differences were large.
The rank_transform
function has documented differences from the colRanks
function:
Rank values are offset such that the rank of a 0 value is 0, and ties are handled by averaging ranks.
If you account for this in your test, you should find that all the values match up:
zero_rank <- matrixStats::colMins(mat)
obj <- as.matrix(obj)
dimnames(obj) <- NULL
waldo::compare(BPCells::add_cols(as.matrix(obj), zero_rank), mat)
Note that the method I showed for finding the zero_rank
assumes that you don't have negative values in the input matrix, but the BPCells function should still work properly even in the presence of negative input values.
Edit: to clarify the rationale here -- maintaining sparse matrices in BPCells is important for efficiency, which is why rank_transform
performs the offset so that the rank of 0 values stays at 0. This doesn't affect most downstream use-cases like spearman correlation or Wilcoxon rank-sum tests. But also, rank_transform
is not itself an exported function from BPCells due to this somewhat unusual behavior.
Thanks for the explanations, is it possible to provide an argument to control the behaviour, so we can implement a rowRanks
and colRanks
function easily
Or implement a colMins
function, so can combine both function to create rowRanks
and colRanks
function
Hey @Yunuuuu, thanks for taking a look this and the other issues/pull requests you've opened this weekend -- I appreciate the help! It will take me a little bit of time to review each of the issues and pull requests, probably Tuesday/Wednesday for most of them since I'm a bit busy for the start of this week.
Do you happen to know any C++? I think implementing a colMins function might be the cleanest way to get at your use-case. It's not on my immediate roadmap, but it could make a good first contribution to the C++ side of the code base and would help you solve your use-case here. I can of course offer pointers if you're interested, or even arrange a short video call in Pacific Time to give an overview of all the steps.
Thanks for your great work in this package, I can only write some small C++ codes. I have tried to understand the function matrix_value_histogram_cpp
and wilcoxon_rank_sum
, they are too complicated for me. I'm not famaliar many basic cpp function.
These issues were all found when I tested the BPCells basic function,I think I can make a pull request to add more unit tests
Hi @Yunuuuu, I think I've come up with a way you can do this fully with the existing BPCells functionality. I still think that the performance of an exact colRanks equivalent in BPCells is likely to be quite poor since the matrix will be dense thus requiring 10-20x more computation and storage for downstream operations than the sparse version that BPCells provides.
However, if you really want to implement this operation for compatibility reasons, then I think this should do the trick for you:
dense_col_rank <- function(m) {
zeros <- nrow(m) - matrix_stats(m, col_stats="nonzero")$col_stats["nonzero",]
negatives <- nrow(m) - colSums(m > 0) - zeros
rank_offsets <- ifelse(zeros != 0, negatives + (1+zeros)/2, 0)
add_cols(rank_transform(m, "col"), rank_offsets)
}
Sorry for the delayed reply, by following the code snippet you provided, I successfully created the rowRanks
and colRanks
function. For me, it's okay to close this thread or you would lik to keep it open for others who might have similar questions.
I just added the modified rank_transform
for others who may want to use it:
rank_transform <- function(object, axis = NULL, offset = TRUE, ...) {
main_axis <- BPCells::storage_order(object)
if (is.null(axis)) {
axis <- main_axis
cli::cli_inform("Doing {.val {axis}} rank transformation")
} else {
axis <- match.arg(axis, c("row", "col"))
if (axis != main_axis) {
cli::cli_warn(c(
"transposing the storage order for {.arg object}",
i = "{.arg axis} specified is different from the the storage axis of {.arg object} ({.val {main_axis}})" # nolint
))
object <- BPCells::transpose_storage_order(
matrix = object, ...
)
}
}
matrix <- BPCells:::rank_transform(mat = object, axis = axis)
if (!offset) {
if (axis == "row") {
nonzero <- BPCells::matrix_stats(object,
row_stats = "nonzero"
)$row_stats["nonzero", , drop = TRUE]
zeros <- ncol(object) - nonzero
negatives <- ncol(object) - rowSums(object > 0) - zeros
rank_offsets <- ifelse(zeros != 0,
negatives + (1 + zeros) / 2, 0
)
matrix <- matrix + rank_offsets
} else {
nonzero <- BPCells::matrix_stats(object,
col_stats = "nonzero"
)$col_stats["nonzero", , drop = TRUE]
zeros <- nrow(object) - nonzero
negatives <- nrow(object) - colSums(object > 0) - zeros
rank_offsets <- ifelse(zeros != 0,
negatives + (1 + zeros) / 2, 0
)
matrix <- t(t(matrix) + rank_offsets)
}
}
matrix
}
Glad it worked for you, and thanks for sharing that code snippet
Created on 2024-01-28 with reprex v2.0.2