Open eddelbuettel opened 4 weeks ago
I was able to reproduce this locally on macOS; I'll see if I can learn more.
Which one did you try? openxlsx
may be the smaller of the two.
Indeed, I tried with openxlsx
. Next step is to try and narrow down a reproducible example further...
This seems like potentially a bug in openxlsx
? Here's a more minimal example:
library(openxlsx)
trace(openxlsx:::read_workbook, quote(print(utils::ls.str())))
xlsxFile <- system.file("extdata", "readTest.xlsx", package = "openxlsx")
read.xlsx(xlsxFile, 2, startRow = 3, colNames = FALSE, detectDates = TRUE)
The function read_workbook
gets called with:
Browse[1]> ls.str()
clean_names : function (x, schar)
cols_in : int [1:88] 1 2 3 4 5 6 7 8 9 1 ...
hasColNames : logi FALSE
hasSepNames : chr "."
is_date : logi [1:88] FALSE FALSE FALSE FALSE FALSE FALSE ...
nRows : int 32
rows_in : int [1:88] 3 3 3 3 3 3 3 3 3 4 ...
skipEmptyCols : logi TRUE
skipEmptyRows : logi TRUE
string_inds : int(0)
v : chr [1:88] "1" "2" "3" "4" "5" "6" "7" "8" "9" "1" "2" "3" "4" "5" "6" "7" "8" "8" "2" "2" "3" "4" "4" ...
Note that string_inds
is an empty vector. That eventually gets read with code in read_workbook.cpp:503
:
st_inds0[0] = string_inds[0];
So they're attempting to read from a zero-length integer vector, which fails. I'm not sure why that would've snuck by tests in the past, though.
We were just talking to @JanMarvin the other day in another issue so let's call him. I am sure he heard from CRAN too ...
Any insight from your end?
It looks like vapour
might be running into something similar? I'm not positive, but there is code like:
GDALDatasetH ds = gdalraster::gdalH_open_dsn(dsource[0], 0);
where gdalH_open_dsn
has the definition:
// open the DSN, open a subdataset if sds > 0 (else you get the outer shell)
inline GDALDatasetH gdalH_open_dsn(const char * dsn, IntegerVector sds) {
GDALDatasetH DS;
DS = GDALOpen(dsn, GA_ReadOnly);
if (!DS) return nullptr;
if (sds[0] > 0 && gdal_has_subdataset((GDALDataset*) DS)) {
CharacterVector sdsnames = gdal_subdataset_1((GDALDataset*)DS, sds[0]);
if (sdsnames.length() > 0 && !sdsnames[0].empty()) {
GDALClose((GDALDataset*) DS);
DS = GDALOpen(sdsnames[0], GA_ReadOnly);
}
}
return DS;
}
That is, an IntegerVector is being constructed implicitly from the int 0
, which gives a zero-length integer vector rather than an integer vector of length 1 with value 0 (which the author probably expected / assumed). That implies the sds[0]
check in that function will then give an out-of-bounds access into a length-zero vector.
I also took a look at fixest
and it looks like another variation on the same issue -- an attempt to read / write into a zero-length vector. Here's a reproducible example (distilled from the failing example code)
library(fixest)
set.seed(0)
base = iris
names(base) = c("y", "x1", "x2", "x3", "fe1")
base$fe2 = rep(1:5, 30)
base$y[1:5] = NA
base$x1[4:8] = NA
base$x2[4:21] = NA
base$x3[110:111] = NA
base$fe1[110:118] = NA
base$fe2[base$fe2 == 1] = 0
base$fe3 = sample(letters[1:5], 150, TRUE)
base$period = rep(1:50, 3)
base$x_cst = 1
res = feols(c(y, x1) ~ 1 | fe1 | x2 ~ x3, base)
And indeed, the failure occurs at code like the following:
XtX(0, 0) = value;
where XtX
is a 0 x 0 matrix. It seems like this is the common thread between the various issues.
All that said, I'm not sure what we should do:
Hi @eddelbuettel I'm poking at it in https://github.com/ycphs/openxlsx/pull/472 (but afaik nobody is actively working on openxlsx
and I'm not its maintainer, I'm just following Rcpp
😄 )
@JanMarvin Whoops. My bad.
@kevinushey We do it some places (vector/Subsetter.h) and it might be hard to do it "universally" :-/
On the fixest
bug. Thanks @kevinushey for looking into it!
It was definitely a bug. Now fixed.
reservr
suffered from the same (UB working with length-0 input). Thank you for providing the hints here.
Hi! I maintain 3 packages which are affected, all using Rcpp with the same pattern.
Rcpp::NumericVector
or IntegerVector
called some_data
&some_data[0]
which I am using to get the pointer to the first element in the arrayis that ok? If so I guess the fix will be provided in Rcpp? If not, I can update my packages. is there another/better way to get a pointer to the first element in the array?
I don't think my issues involve length-0 vectors since in that case my code stops with an error before trying to get the pointer, for example https://github.com/tdhock/FLOPART/blob/main/src/interface.cpp
int data_count = data_vec.size();
if(data_count < 2){
Rcpp::stop("need at least two data points");
}
@tdhock The bug is most probably in your package. Rcpp can just warn (for now) or error (in a future release) so that you are aware of these bugs before CRAN UBSAN/ASAN checks are reported.
Note that the UBSAN/ASAN errors I see on CRAN are not on line 55, but on line 57, where you do this:
&label_type_vec[0], &label_start_vec[0], &label_end_vec[0], label_count,
And I don't see checks for length-0 vectos for these ones.
@tdhock You could consider doing what I just did: reducing the test surface by not running the tests offending UBSAN.
hi Inaki and Dirk thanks for the feedback, that is helpful.
indeed I will update my packages to avoid doing &some_vec[0]
on zero-length vectors, which actually can be provided by the user/tests sometimes. In the example above, my suggested fix is:
const int *label_type_ptr=0, *label_start_ptr=0, *label_end_ptr=0;
if(0 < label_count){
label_type_ptr = &label_type_vec[0];
label_start_ptr = &label_start_vec[0];
label_end_ptr = &label_end_vec[0];
}
int status = FLOPART
(&data_vec[0], &weight_vec[0],
data_count, penalty,
label_type_ptr, label_start_ptr, label_end_ptr, label_count,
Per incoming email
with short followup