Open rdstern opened 2 years ago
@rdstern this is coming from the sjlabelled::as_numeric
function.
To explain the issue - "8" becomes "28" because it is the 28th "level" as far as R is concerned - this is because when R does factor levels, it does not give them by what we see as the normal "order", but it does it by the first number.
E.g: 1, 10, 100283, 2, 3, 4, 4444, 5, etc Rather than 1, 2, 3, 4, 5, 10, 4444, 100283
However, this is not usually what we would expect to happen in R, because these are character variables, not factor variables
What I suspect is happening here is that our variable is converted into a factor before it is converted as a numeric variable. I think this is due to the missing values.
# E.g., if we have a character vector with no missing values, then it converts fine for sjlabelled::as_numeric
# but `as.numeric(as.factor(` is not so nice since it replaces our vector as the level order
a <- c("2", "1", "11", "200")
#[1] "2" "1" "11" "200"
as.numeric(a)
#[1] 2 1 11 200
as.numeric(as.factor(a))
#[1] 3 1 2 4
sjlabelled::as_numeric(a, keep.labels = TRUE)
#[1] 2 1 11 200
# E.g., if we have a character vector with missing values, then it converts for sjlabelled::as_numeric like it would if it were a factor
# E.g., it replaces our vector as the level order
d <- c("2", "1", "11", "200", "")
as.numeric(d)
as.numeric(as.factor(d))
sjlabelled::as_numeric(d, keep.labels = TRUE)
One suggestion around this would be to replace your blank values with the numeric value you would want to give them. For example, with a 0, or -99, etc. That then works -
x <- readRDS("C:/Users/lclem/Downloads/abundant/abundant.RDS")
x$split5 <- ifelse(x$split5 == "", 0, x$split5)
sjlabelled::as_numeric(x$split5, keep.labels = TRUE)
I hope @dannyparsons could suggest on what to do about this result, but expect it could become a puzzle for @lilyclements? Here is a data frame called abundant abundant.zip
It has character variables that need to be made numeric: In the Convert dialogue I put the variables split to split22.
The default making them numeric is shown above. I get the correct answer to the convert by using the simple convert option. If I leave it with the default I get the seemingly odd result as follows:
This is the labelled convert, but not a set of data where I would usually want labels. The results are fine up to split4, but then the blanks caused by the split dialogue produce the odd labelled results from then on.
What should we do? This is not a dialogue I have ever liked, but am not sure what should be done - at least as a default for this sort of situation?
We could make the
Simple Convert
the default when converting to numeric. However, then the Labelled Convert becomes invisible, so you might not know it exists? And do we need those factor options in the dialogue, when converting from Character to Numeric?I would not have noticed this problem if I could have used the right-click Convert to numeric easier. It works fine, because I can do a
Normal Convert
, which is the same as theSimple Convert
in the dialogue. Should we call it the same in both. And in the right-click option could there be our new special addition to the Normal Convert (and Labelled) Convert buttons perhaps just Apply All. Or maybe it could be a checkbox. If a checkbox, then it should probably not remember, but be unchecked each time.