RfastOfficial / Rfast

A collection of Rfast functions for data analysis. Note 1: The vast majority of the functions accept matrices only, not data.frames. Note 2: Do not have matrices or vectors with have missing data (i.e NAs). We do no check about them and C++ internally transforms them into zeros (0), so you may get wrong results. Note 3: In general, make sure you give the correct input, in order to get the correct output. We do no checks and this is one of the many reasons we are fast.
143 stars 19 forks source link

Rfast::colMads() crashing R session on NA #24

Closed thewoodsofcoding closed 4 years ago

thewoodsofcoding commented 4 years ago

Hi, I use Rfast for robust calculations on huge data sets, say 1000+ rows for each column. I want to calculate col medians and col mads. It happens that Rfast::colMedians() and Rfast::colMads() have no issue with NA but only calculate median or mad if NA is <50% of the columns content, however I want it also to be calculated even if there are only, lets say 2 of 1000 values != NA, so I use na.rm = T. I also have columns that have only NA and thous will be calculated as NA for Rfast::colMedians(mat, na.rm = T) so it works fine. But for Rfast::colMads(mat, na.rm = T) R session crashes as soon there are NA only columns in mat. Sure I can remove NA only columns prior to calculation, but I think this is a bug of Rfast::colMads(mat, na.rm = T) function. Attached find an example for reproduction:

noNA <- c(1,2,3,4)
someNA <- c(1,2,3,NA)
equalNA <- c(1,2,NA,NA)
moreNA <- c(1,NA,NA,NA)
onlyNA <- c(NA,NA,NA,NA)
mat <- cbind(noNA, someNA, equalNA, moreNA, onlyNA)
mat
#       noNA someNA equalNA moreNA onlyNA
# [1,]    1      1       1      1     NA
# [2,]    2      2       2     NA     NA
# [3,]    3      3      NA     NA     NA
# [4,]    4     NA      NA     NA     NA
Rfast::colMedians(mat)
# [1] 2.5 2.5  NA  NA  NA    #too much NA here
Rfast::colMedians(mat, na.rm = T)
# [1] 2.5 2.0 1.5 1.0  NA    #desired result
Rfast::colMads(mat)
# [1] 1.4826 1.4826     NA     NA     NA
Rfast::colMads(mat, na.rm = T)
# R session crashes
ManosPapadakis95 commented 4 years ago

It is not a bug. It's just that I have never thought that someone would really have this case in his data. What do you want as result?

thewoodsofcoding commented 4 years ago

Thanks for the fast response! Best would be to return an NA for cases of only one value or non value (only NA) columns.

ManosPapadakis95 commented 4 years ago

Done.