AnestisTouloumis / multgee

GEE solver for correlated nominal or ordinal multinomial responses using a local odds ratios parameterization.
https://CRAN.R-project.org/package=multgee
9 stars 1 forks source link

ordLORgee, estimate in opposite direction than expected #4

Closed gautig closed 2 years ago

gautig commented 2 years ago

Dear Anestis Touloumis

First I want to thank you and gratulate you for this wonderful R package, multgee.

I am running an analysis on data collected with questionnaires, where each individual has answered multiple time. The response can both be formulated as a linear score ranging from 1 to around 30, but also as categories, 4 to 7 (depending on questionnaire). For the linear analysis I used a geeglm from package geepack. For the categorial analysis I first used ordgee package, but that gave inconsistent results. Sometimes it did not converge or gave other errors. I therefore turned to your package multgee and function ordLORgee.

ordLORgee always converges and returns results, but the estimates for all significant variables are not same direction as in other analyses, like geeglm. Where I expected negative estimates I now get positive.

I made sure that my ID is a number ranging from 1 … n and the data is ordered by the ID. The response is a ordered factor. I also tested casting the response to a number. Then there are covariates age and gender. The variable of interest is the event, that only happens to some participants,

Here is example data (not real data): ID2 gad_score_fct sex age_at_questionnaire event time

1 minimal anxiety male 66 0 0 1 minimal anxiety male 70 0 1 2 minimal anxiety male 61 0 0 2 minimal anxiety male 64 0 1 3 mild anxiety female 54 0 0 3 mild anxiety female 57 0 1 Can you see anything that might be causing this? Best, Gauti
AnestisTouloumis commented 2 years ago

Hi Gauti,

It is unclear to me whether you are comparing ordLORgee with the results from ordgee or geeglm functions, so I consider both cases:

  1. Comparison with ordgee: It seems that there is a bug in ordgee so I would not use this function. See Introduction Section
  2. Comparison with geeglm: It depends on the coding of the response categories for the categorical and of the scores in the linear variable (higher score implies anxiety?). The key is to focus on the interpretation and not the sign. For example, consider the following code, using the dataset arthritis from multgee:

library("multgee") data("arthritis") fitmod <- ordLORgee(formula = y ~ factor(trt) , data = arthritis, id = id, repeated = time, LORstr = "independence") coef(fitmod)

  beta10       beta20       beta30       beta40 factor(trt)2 
  -3.021       -1.038        0.698        2.656       -0.521 

library("geepack") fitmod_gee <- geeglm(formula = y ~ factor(trt), data = arthritis, id = id, corstr = "independence") coef(fitmod_gee)

  (Intercept) factor(trt)2 
    3.079        0.296 

In this example, the signs for the estimated regression coefficient for the treatment are different, but the interpretation is similar. In the gee multinomial model, the estimated coefficient is negative which means that the cumulative odds for a fixed response category or below are lower for the treatment (hence the treatment is better than the placebo). In the linear model (second model), it seems that the treatment increases the score of the response variable (again treatment is better than placebo).

I hope this helps. Just keep in mind, that the above is not a mathematical proof and I cannot prove that this is true for all datasets.

gautig commented 2 years ago

Hi Anthestis

Thank you very much for you answer.

I was comparing ordLORgee to both geeglm and ordgee, but will not use ordgee because of known bugs (as you mention) but also because it does not work with glht, which I use to pull out results for interaction terms.

I still don't understand why the sign of the estimate is not the same but still interpretation the same. I am even tempted to reverse my ordered outcome so I get the estimates with same sign as I am used to.