Closed const-ae closed 1 year ago
Thanks again.
There's nothing that changed in the behavior of useNames = TRUE
since matrixStats 0.63.0;
library(matrixStats)
stopifnot(packageVersion("matrixStats") == "0.63.0")
X <- matrix(1:5, ncol = 1L)
Xs <- list(
none = X,
null = structure(X, dimnames = list(NULL, NULL))
)
ys <- lapply(Xs, FUN = colCumsums, useNames = TRUE)
str(ys)
gives
List of 2
$ none: int [1:5, 1] 1 3 6 10 15
$ null: int [1:5, 1] 1 3 6 10 15
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : NULL
What changed is that we're moving from useNames = NA
to useNames = TRUE
, by default (#232).
Base R treats empty dimnames the same as no dimnames, e.g.
> ys <- lapply(Xs, FUN = colSums)
> str(ys)
List of 2
$ none: num 15
$ null: num 15
ys <- lapply(Xs, FUN = colMeans)
> str(ys)
List of 2
$ none: num 3
$ null: num 3
ys <- lapply(Xs, FUN = apply, MARGIN = 2L, sum)
> str(ys)
List of 2
$ none: int 15
$ null: int 15
ys <- lapply(Xs, FUN = apply, MARGIN = 2L, mean)
> str(ys)
List of 2
$ none: num 3
$ null: num 3
Thus, this is how matrixStats should also do it.
matrixStats behaves like base R, except for the cumulative functions, e.g. all of the following returns unnamed results:
ys <- lapply(Xs, FUN = colMeans2, useNames = TRUE)
ys <- lapply(Xs, FUN = colMedians, useNames = TRUE)
ys <- lapply(Xs, FUN = colMads, useNames = TRUE)
ys <- lapply(Xs, FUN = colVars, useNames = TRUE)
ys <- lapply(Xs, FUN = colIQRs, useNames = TRUE)
ys <- lapply(Xs, FUN = colLogSumExps, useNames = TRUE)
ys <- lapply(Xs, FUN = colWeightedMeans, useNames = TRUE)
ys <- lapply(Xs, FUN = colWeightedMedians, useNames = TRUE)
However, the following, cumulative functions treats the two cases differently:
ys <- lapply(Xs, FUN = colCumsums, useNames = TRUE)
ys <- lapply(Xs, FUN = colCumprods, useNames = TRUE)
ys <- lapply(Xs, FUN = colCummins, useNames = TRUE)
ys <- lapply(Xs, FUN = colCummaxs, useNames = TRUE)
BTW, setting dimnames to character(0L)
is the same as setting them to NULL
in R, e.g.
> str(X)
int [1:5, 1] 1 2 3 4 5
> rownames(X) <- character(0)
> str(X)
int [1:5, 1] 1 2 3 4 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : NULL
> rownames(X) <- NULL
> str(X)
int [1:5, 1] 1 2 3 4 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : NULL
In other words, we don't have to treat character(0)
specially; it's taken care of by R itself.
There are more functions:
ys <- lapply(Xs, FUN = colRanges, useNames = TRUE)
ys <- lapply(Xs, FUN = colRanks, useNames = TRUE)
also preserves NULL dimnames.
I think what they have in common is that they call an internal C function setDimnames()
;
$ grep -F "setDimnames(" src/*.c
src/colRanges.c: setDimnames(ans, dimnames, ncols, ccols, 0, crows, TRUE);
src/colRanges.c: setDimnames(ans, dimnames, ncols, ccols, 0, crows, TRUE);
src/naming.c:void setDimnames(SEXP mat/*Answer matrix*/, SEXP dimnames, R_xlen_t nrows,
src/rowCummaxs.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummaxs.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummins.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummins.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumprods.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumprods.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumsums.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumsums.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanges.c: setDimnames(ans, dimnames, nrows, crows, 0, ccols, FALSE);
src/rowRanges.c: setDimnames(ans, dimnames, nrows, crows, 0, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
which means we can probably fix this in a single location. I'll have a look.
@yaccos , could you please have a lock at
and see if the problem reported here, can be fixed there?
Yes, you can. If you add the special case test:
/* In case both elements of the dimname is NULL, we disregard the name
attribute completely in order to conform to base R behavior */
if (VECTOR_ELT(dimnames, 0) == R_NilValue && VECTOR_ELT(dimnames, 1) == R_NilValue) {
return;
}
at the top of the function, you should get the expected behavior. Besides, it annoyes me that many of the calls to setDimnames
is duplicated even if there is no good reason for it. The worst offender here is rowRanksWithTies
where the code for generating the answer is split across a multitude of cases, but the naming would be achieved the same way for all cases if it was put to the end of the function. I apologize for not being vigilent enough to discover this redundancy during the GSOC project.
https://github.com/HenrikBengtsson/matrixStats/blob/7a45f8515a968e52bd92f08e32a14acc97f8b2aa/src/rowRanksWithTies.c#L68-L401
Thanks for the prompt fix @yaccos ! I've added it to matrixStats 0.63.0-9010.
... the naming would be achieved the same way for all cases if it was put to the end of the function. I apologize for not being vigilent enough to discover this redundancy during the GSOC project.
If you've got time, feel free to do a PR for this. And absolutely no worries; it's really hard to be on top of everything at all time, and can take years and a fresh mind to sometimes spot even the most obvious things.
Thanks for the quick fix and sorry for the poor description.
There is one more function (colDiffs
) that returns empty dimnames:
mat_empty_names <- matrix(1, nrow = 5, ncol = 1, dimnames = list(NULL, NULL))
str(matrixStats::colCumsums(mat_empty_names))
#> num [1:5, 1] 1 2 3 4 5
str(matrixStats::colDiffs(mat_empty_names))
#> num [1:4, 1] 0 0 0 0
#> - attr(*, "dimnames")=List of 2
#> ..$ : NULL
#> ..$ : NULL
Created on 2023-06-01 with reprex v2.0.2
Thanks for the quick fix and sorry for the poor description.
There is one more function (
colDiffs
) that returns empty dimnames:mat_empty_names <- matrix(1, nrow = 5, ncol = 1, dimnames = list(NULL, NULL)) str(matrixStats::colCumsums(mat_empty_names)) #> num [1:5, 1] 1 2 3 4 5 str(matrixStats::colDiffs(mat_empty_names)) #> num [1:4, 1] 0 0 0 0 #> - attr(*, "dimnames")=List of 2 #> ..$ : NULL #> ..$ : NULL
Created on 2023-06-01 with reprex v2.0.2
colDiffs()
and rowDiffs()
have their own respective C-level functions for naming (https://github.com/HenrikBengtsson/matrixStats/blob/fa94b84cee51b619f10f29bfc48ee8a2b0ac7780/src/naming.c#L128 and https://github.com/HenrikBengtsson/matrixStats/blob/fa94b84cee51b619f10f29bfc48ee8a2b0ac7780/src/naming.c#L193). This means that the same fix must be applied to these locations as well.
Thanks both. Fixed in matrixStats 0.63.0-9011.
Continuing the discussion from #232.
With the new implementation of
useNames = TRUE
,matrixStats::colCumsums
and other functions that return a matrix, treat a matrix without a dimnames attribute anddimnames = list(NULL, NULL)
differently (v0.63.0-9008).Created on 2023-05-30 with reprex v2.0.2
Previously, they were treated the same (v0.63.0):
Created on 2023-05-30 with reprex v2.0.2
From the sparse matrix point-of-view, the old behavior was preferable because dgCMatrix treats matrices without a dimnames attribute and
dimnames = list(NULL, NULL)
the same.