HenrikBengtsson / matrixStats

R package: Methods that Apply to Rows and Columns of Matrices (and to Vectors)
https://cran.r-project.org/package=matrixStats
203 stars 34 forks source link

CRAN: Fix UBSan errors before 2024-09-19 #268

Closed HenrikBengtsson closed 1 month ago

HenrikBengtsson commented 1 month ago

CRAN notification

From: Prof Brian Ripley Date: Wed, Sep 4, 2024 at 10:35 PM Subject: CRAN package matrixStats To: henrikb@braju.com Cc: CRAN@r-project.org

Dear maintainer,

Please see the problems shown on https://cran.r-project.org/web/checks/check_results_matrixStats.html.

Please correct before 2024-09-19 to safely retain your package on CRAN.

Do remember to look at the 'Additional issues'.

The CRAN Team

Details

From https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/matrixStats/ and then https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-UBSAN/matrixStats/tests/rowSums2_subset.Rout:


R Under development (unstable) (2024-09-04 r87094) -- "Unsuffered Consequences"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library("matrixStats")
> 
> rowSums2_R <- function(x, na.rm = FALSE, ..., useNames = NA) {
+   ## FIXME: sum() may overflow for integers, whereas
+   ## base::rowSums() doesn't.  What should rowSums2() do?
+   ## apply(x, MARGIN = 1L, FUN = sum, na.rm = na.rm)
+   res <- rowSums(x, na.rm = na.rm)
+   if (is.na(useNames) || !useNames) names(res) <- NULL
+   res
+ }
> 
> colSums2_R <- function(x, na.rm = FALSE, ..., useNames = NA) {
+   ## FIXME: sum() may overflow for integers, whereas
+   ## base::colSums() doesn't.  What should colSums2() do?
+   ## apply(x, MARGIN = 2L, FUN = sum, na.rm = na.rm)
+   res <- colSums(x, na.rm = na.rm)
+   if (is.na(useNames) || !useNames) names(res) <- NULL
+   res
+ }
> 
> 
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> # Subsetted tests
> # - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> source("utils/validateIndicesFramework.R")
> x <- matrix(runif(6 * 6, min = -3, max = 3), nrow = 6, ncol = 6)
> storage.mode(x) <- "integer"
> 
> # To check names attribute
> dimnames <- list(letters[1:6], LETTERS[1:6])
> 
> # Test with and without dimnames on x
> for (setDimnames in c(TRUE, FALSE)) {
+   if (setDimnames) dimnames(x) <- dimnames
+   else dimnames(x) <- NULL
+   for (rows in index_cases) {
+     for (cols in index_cases) {
+       for (na.rm in c(TRUE, FALSE)) {
+         for (useNames in c(if (!matrixStats:::isUseNamesNADefunct()) NA, TRUE, FALSE)) {
+           validateIndicesTestMatrix(x, rows, cols,
+                                     ftest = rowSums2, fsure = rowSums2_R,
+                                     na.rm = na.rm, useNames = useNames)
+           validateIndicesTestMatrix(x, rows, cols,
+                                     fcoltest = colSums2, fsure = rowSums2_R,
+                                     na.rm = na.rm, useNames = useNames)
+         }
+       }
+     }
+   }
+ }
rowSums2_lowlevel_template.h:49:7: runtime error: null pointer passed as argument 1, which is declared to never be null
    #0 0x7fb157338be2 in rowSums2_int /data/gannet/ripley/R/packages/tests-gcc-SAN/matrixStats/src/rowSums2_lowlevel_template.h:49
    #1 0x7fb1573e8a6b in rowSums2 /data/gannet/ripley/R/packages/tests-gcc-SAN/matrixStats/src/rowSums2.c:54
    #2 0x58e2c1 in R_doDotCall /data/gannet/ripley/R/svn/R-devel/src/main/dotcode.c:780
    #3 0x656027 in bcEval_loop /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:8686
    #4 0x690d32 in bcEval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:7519
    #5 0x6a75ec in R_compileAndExecute /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:1947
    #6 0x6a8419 in do_for /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:2791
    #7 0x691f58 in Rf_eval /data/gannet/ripley/R/svn/R-devel/src/main/eval.c:1234
    #8 0x71742d in Rf_ReplIteration /data/gannet/ripley/R/svn/R-devel/src/main/main.c:265
    #9 0x717a91 in R_ReplConsole /data/gannet/ripley/R/svn/R-devel/src/main/main.c:317
    #10 0x717bd4 in run_Rmainloop /data/gannet/ripley/R/svn/R-devel/src/main/main.c:1219
    #11 0x717c22 in Rf_mainloop /data/gannet/ripley/R/svn/R-devel/src/main/main.c:1226
    #12 0x41a7f8 in main /data/gannet/ripley/R/svn/R-devel/src/main/Rmain.c:29
    #13 0x7fb167c2950f in __libc_start_call_main (/lib64/libc.so.6+0x2950f) (BuildId: 8257ee907646e9b057197533d1e4ac8ede7a9c5c)
    #14 0x7fb167c295c8 in __libc_start_main_alias_2 (/lib64/libc.so.6+0x295c8) (BuildId: 8257ee907646e9b057197533d1e4ac8ede7a9c5c)
    #15 0x41d054 in _start (/data/gannet/ripley/R/gcc-SAN/bin/exec/R+0x41d054) (BuildId: 7564f1d5a8a7225dd44f81c00873f7d73c5e65e6)

> 
> proc.time()
   user  system elapsed 
  9.857   0.317  10.710 
yaccos commented 1 month ago

I think this results from an edge case which occurs when calling rowSums2() with a zero-lenght row subset. In this case, R_alloc() returns a NULL pointer for the row subset. The code than calls memset() on this pointer. Because the lenght of the memory to write to is zero, this should behave like a no-op, but still appearently considered undefined behavior.

https://stackoverflow.com/questions/8597034/can-memset-be-called-with-a-null-pointer-if-the-size-is-0