Closed jonas-sk closed 4 years ago
Hi, this is also likely caused by missing dependencies. I added checks for each imputation method: 08f006727cb4f78266172c68bdc94ae9dfafa6b5 Let me know if that helps. Thanks
Thank you for the quick answer! Dependencies are installed and I didn't get an error after updating rtemis and re-running the code.
The error appears when you recode any of the -88, -77, -99 columns in the data set uploaded in issue #24 to NA_character_
,
You need to provide a minimal reproducible example.
That should be NA
, not NA_character_
, the latter converts your numeric column to character.
Even then, impute works for me.
Looking at the data, I guess these should all be converted to factors, right?
Apologies, I totally forgot to provide a reproducible example. The reason why I initially used NA_character_
is that my original table still had labels behind the values, which I removed in a previous step, so I was just respecting the column variable types. However, even using the table I have sent you, I still get the same error. The following reproduces the error for me:
read_csv("cases_test.csv") %>%
mutate_all(list(~ dplyr::recode(.,`-99` = NA_real_,
`-88` = NA_real_,
`-77` = NA_real_
))) %>%
preprocess(impute = TRUE, numeric2factor = TRUE)
This is the full output:
Parsed with column specification:
cols(
.default = col_double()
)
See spec(...) for full column specifications.
[2020-06-28 13:47:01 preprocess] Converting numeric to factor
[2020-06-28 13:47:01 preprocess] Imputing missing values using missRanger...
Missing value imputation by random forests
Error in `[.data.frame`(data, , relevantVars[[1]], drop = FALSE) :
undefined columns selected
Removing numeric2factor = TRUE
leads to the same error.
Sidenote: If you use NA
instead of NA_real_
(or NA_character_
), the whole table, at least in my case, will consist of NAs.
This is a readr
+ missRanger
issue:
missRanger
cannot handle column names beginning with numbers,
which is generally best avoided in R.
Base R read.csv
adds an X in front of the column name in those cases, read_csv
does not.
This works:
read.csv("cases_test.csv") %>%
mutate_all(list(~ dplyr::recode(.,`-99` = NA_integer_,
`-88` = NA_integer_,
`-77` = NA_integer_
))) %>%
preprocess(impute = T, numeric2factor = T) -> dat
and in base:
dat <- read.csv("cases_test.csv")
dat[dat == "-99"] <- dat[dat == "-88"] <- dat[dat == "-77"] <- NA
dat <- preprocess(dat, numeric2factor = T, impute = T)
When using the preprocess command with
impute = TRUE
and otherwise default values (i.e.impute.type ="missRanger"
), the following error occurs:The error does not appear when using
missForest