RfastOfficial / Rfast

A collection of Rfast functions for data analysis. Note 1: The vast majority of the functions accept matrices only, not data.frames. Note 2: Do not have matrices or vectors with have missing data (i.e NAs). We do no check about them and C++ internally transforms them into zeros (0), so you may get wrong results. Note 3: In general, make sure you give the correct input, in order to get the correct output. We do no checks and this is one of the many reasons we are fast.
139 stars 19 forks source link

documented input requirements in Rfast::g2Test not sufficient #57

Closed spitzem closed 2 years ago

spitzem commented 2 years ago

Hi everyone,

when i ran Rfast::g2Test, the R session repeatedly crashed. The issue seems to be with columns of the input data that have the minimum zero but have a gap in their unique integer values. Say, their unique values are c(0, 1, 2, 3, 5). This situation arises for example with integer derived from factors with factor levels that are unobserved in the dataset at hand.

I'm using R 4.1.2 and Rfast 2.0.6

The following example always crashes my R-Session:

ncol <- 3L
nrow <- 100L

values <- sample(
  x = 0L:4L,
  size = ncol * nrow,
  replace = TRUE
)

dataMat <- matrix(
  data = values,
  nrow = nrow,
  ncol = ncol
)

# This works
testResults1 <- Rfast::g2Test(
  data = dataMat,
  x = 1L,
  y = 2L, cs = 3L, dc = apply(dataMat, 2, data.table::uniqueN)
)

# This crashes my R-session
dataMat[which(dataMat[, 3] == 4), 3] <- 5L
testResults1 <- Rfast::g2Test(
  data = dataMat,
  x = 1L,
  y = 2L, cs = 3L, dc = apply(dataMat, 2, data.table::uniqueN)
)
statlink commented 2 years ago

Hi Spitzem, the function is super higly optimized and fast. The cost is, as you mentioned, the minimum vlaue must be zero and the nubmers should be consecutive.

I am afraid there is nothing we can do.

spitzem commented 2 years ago

Hi starlink, yes i posted is as a documentation issue because it took me some time to find out, why the crash is happening.