ManuelNeumann / MNLpred

This package includes functions that return predictions from estimates of multinomial logistic models. The function uses models that were estimated with the multinom function from the nnet package.
12 stars 1 forks source link

Factor variables in mnl_pred_ova #1

Open janix9 opened 4 years ago

janix9 commented 4 years ago

Hi, I've been trying to plot marginal effects for a multinomial regression model. I tried using the mnl_pred_ova function, but I have several factor (binary) independent variables which the function can so far not deal with. It would be very helpful to have this additional feature. (See also my post in Rcommunity.) Thanks, Janix

ManuelNeumann commented 4 years ago

As far as I understand, multinom() uses the factor variable as dummies? I will definitely look into it!

ManuelNeumann commented 4 years ago

Hey @janix9, could you please provide a MWE about how you normally use factors in multinom()?

janix9 commented 4 years ago

Hi @ManuelNeumann, I had previously made this very minimal example, is that useful?

df <- data.frame(rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse","1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
                 count = c(2,0,5,8,10,3,2,1,0,0,9,1,0,5,7,2,9,0),
                 case = c("Y","N","Y","Y","N","Y","N","Y","N","N","Y","N","Y","N","N","Y","N","Y"))
fit <- multinom(rating ~ count + case,
                         data = df)
summary(fit)
janix9 commented 4 years ago

PS: The point of having binary variables recognized as factors (as opposed to numerical) is to get proper visualizations that do not suggest that they have continuous values. Such as, continuing the above example:

library(effects)
pred1e <- effect("count", fit)
plot(pred1e)
pred2e <- effect("case", fit)
plot(pred2e)
ManuelNeumann commented 4 years ago

Thank you very much!

cassyld commented 4 years ago

Hi Manuel, I was able to replicate my own factor related error building off the code above. In my case, if I run a multinomial (using survey data, drat so many factor/categorical vars!) I cannot use the first difference function. Yet, if I change all the variables in my multinom model (excluding the DV) to "numeric" then the function works (and the results seem sensible).

library(MNLpred)
library(nnet)

# dummy data
temp<- data.frame(
  rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium",
  "3 Worse", "3 Worse", "3 Worse","1 Better", "1 Better", "1 Better", "2 Medium",
  "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
  count = c(2,0,5,8,10,3,2,1,0,0,9,1,0,5,7,2,9,0),
  case = c("Y","N","Y","Y","N","Y","N","Y","N","N","Y",
  "N","Y","N","N","Y","N","Y"),
  numDumb = seq(0,1))

In the above data, case defaults to a factor in the data frame and numDumb defaults as a numeric variable.

Below, I use the model's data because in my own example I have a few NAs in the original dataset, and it seems sensible to use the model's dataframe to make sure the data is correct.

# model
temp$rating <- relevel(temp$rating, ref = "1 Better")
fit <- multinom(rating ~ count + case + numDumb,
                         Hess=TRUE, data = temp)
summary(fit)
# first diff
tdataTemp = data.frame(model.frame(fit))
pred2 = mnl_fd2_ova(model = fit,
    data = tdataTemp,
    xvari = "numDumb",
    value1= min(tdataTemp$numDumb),
    value2= max(tdataTemp$numDumb),
    nsim = 100)

This returns the error, which is the error I also get in my own analysis.

Error in ovacases[, , i] %*% s: requires numeric/complex matrix/vector arguments
Traceback:

1. mnl_fd2_ova(model = fit, data = tdataTemp, xvari = "numDumb", 
 .     value1 = min(tdataTemp$numDumb), value2 = max(tdataTemp$numDumb), 
 .     nsim = 100)
2. apply(matrix(0, nrow = nsim, ncol = ncol(X)), 1, function(s) ovacases[, 
 .     , i] %*% s)
3. FUN(newX[, i], ...)

The only way I have been able to fix this is by going back to the beginning of the above code and forcing any factor variables to be numeric. In this case:

temp$case = as.numeric(temp$case)

After doing this, I am able to run the code:

# first diff
tdataTemp = data.frame(model.frame(fit))
pred2 = mnl_fd2_ova(model = fit,
    data = tdataTemp,
    xvari = "numDumb",
    value1= min(tdataTemp$numDumb),
    value2= max(tdataTemp$numDumb),
    nsim = 100)

And produce the plot: Screen Shot 2020-06-30 at 3 56 25 PM

ManuelNeumann commented 4 years ago

@cassyld, your approach works just fine.

The problem is that the package's functions can not handle any non-numeric variables, yet. The multinom() function converts them into dummies in the background. It is my top priority to emulate this behavior in my functions for a 0.1.0 release.

My current recommendation is to code the dummies manually before fitting the function. As long as the variables don't have too many ordinal or nominal categories, this might be a feasible workaround.

To add to the presented example, my recommendation would look like this:

# Hand-coding dummy:
temp$case_Y <- as.numeric(temp$case == "Y")
table(temp$case, temp$case_Y)

fitb <- multinom(rating ~ count + case_Y + numDumb,
                 Hess=TRUE,
                 data = temp)

# first diff
pred2 = mnl_fd2_ova(model = fitb,
                    data = temp,
                    xvari = "numDumb",
                    value1= min(temp$numDumb),
                    value2= max(temp$numDumb),
                    nsim = 100)
aseimel commented 3 years ago

Hi Manuel, I am running a multinational model with a singular dummy that identifies the treatment group in my causal design. The variable is a simple numeric vector but I still get the "can not handle factor or character variables" error.

Edit: I found that the problem was that the stop you implemented got trigger despite my variable being numeric.

if (length(iv) > 1) { if (sum(apply(data[, iv], 2, class) %in% c("numeric", "integer")) < ncol(data[, iv])) { stop("Please supply data that consists of numeric values. The package can not handle factor or character variables, yet. For workarounds, please take a look at the github issues (https://github.com/ManuelNeumann/MNLpred/issues/1). The problem will hopefully be fixed with the 0.1.0 release.") } } else { if ((class(data[, iv]) %in% c("numeric", "integer")) == FALSE) { stop("Please supply data that consists of numeric values. The package can not handle factor or character variables, yet. For workarounds, please take a look at the github issues (https://github.com/ManuelNeumann/MNLpred/issues/1). The problem will hopefully be fixed with the 0.1.0 release.") } }

If I take this out it works fine. Maybe this could cause problems with other dummy variables too

ManuelNeumann commented 3 years ago

Hey Armin,

can you please report back what class your dummy-variable is if it is not numeric or an integer?

This would be super helpful to prevent the error showing up unnecessarily for other users as well.

aseimel commented 3 years ago

I made a simplified example of my code. I wanted to use the package to estimate the difference in predicted probabilities between my treatment an control groups.

library(tidyverse)
library(MNLpred)
library(nnet)

data <- data_raw %>%
  transmute(date = dDatum,,
            ref_par = recode_factor(as.numeric(ZJ22013), 
                                    `1` = "Refugee", 
                                    `2` = "Refugee",
                                    `3` = "Refugee",
                                    `4` = "Both",
                                    `5` = "Security",
                                    `6` = "Security"),
            ref_par = relevel(ref_par, ref = "Both"),
            day = lubridate::day(date),
            event = if_else(day >= 19,1,0))
is(data$event)`

[1] "numeric" "vector" "index" "replValue" "numLike"
[6] "number" "atomicVector" "numericOrNULL" "numeric or NULL" "replValueSp"

m <-  multinom(ref_par ~ event,  data = data, Hess = T)
fd <- mnl_fd2_ova(model = m, data = data,
                    x = "event",
                    value1 = 0,
                    value2 = 1,
                    nsim = 100)

Error in mnl_fd2_ova(model = m, data = data, x = "event", value1 = 0, : Please supply data that consists of numeric values. The package can not handle factor or character variables, yet. For workarounds, please take a look at the github issues (https://github.com/ManuelNeumann/MNLpred/issues/1). The problem will hopefully be fixed with the 0.1.0 release. In addition: Warning message: In if ((class(data[, iv]) %in% c("numeric", "integer")) == FALSE) { : the condition has length > 1 and only the first element will be used

ManuelNeumann commented 3 years ago

Can you only run class(data$event) once, please?

aseimel commented 3 years ago

Sure if I run class(data$event) the output is

[1] "numeric"

I have this problem whenever I run a model with a singular treatment dummy that is expressed by a numeric vector of zeros and ones. If I add another variable to the model the issue no longer occurs

ManuelNeumann commented 3 years ago

Thanks. I get where the issue is. I will fix the bug in the next few days.

ManuelNeumann commented 3 years ago

@aseimel Can you update the package and try it again, please?

aseimel commented 3 years ago

Sorry about the late reply. It works perfectly now thank you!

abgok commented 2 years ago

Hi, Just checking if 0.1.0 release is coming up soon to fix this issue? Thanks so much in advance.

ManuelNeumann commented 2 years ago

Hey @abgok, unfortunately, I do not have time or other resources to work on issues that have a workaround at the moment. Sorry about that!