Nosferican / Econometrics.jl

Econometrics in Julia
https://nosferican.github.io/Econometrics.jl/dev
ISC License
69 stars 19 forks source link

Ordinal logit regression: cutoff ordering during optimization #81

Open Rubaiyat-Alam opened 2 years ago

Rubaiyat-Alam commented 2 years ago

Hello,

I'm running the ordinal response model in Econometrics.jl (which corresponds to Stata's ologit) and am experiencing an error. For certain specs, after I run the ologit, I get an error which says that log is being taken of a negative value. The same spec works fine on Stata. I coded up the ologit in Julia, and it looks like the ordering of the cutoff is not being maintained over the optimization process. I explain in detail below.

In ordinal logit we must have that cutoff 1 < cutoff 2 < cutoff 3 < .... and so on. This ordering must hold at all times during the optimization because Pr(d = j |X) = F(cutoff_j ... |X) - F(cutoffj-1 .... |X), so if cutoff{j - 1} > cutoff_{j}, then Pr(d = j |X) becomes negative. Since ologit estimation requires taking logs of probability, this could end up in taking log of a negative value, throwing an error.

What happens when I run ologit is that as the optimizer perturbs over differing values of the cutoff, it sometimes uses trial values where cutoff_{j - 1} > cutoff_j, which throws an error of logs being taken of a negative value.

My guess is that the optimization problem for ologit needs to include a constraint of the cutoffs following the right ordering, and that will be enough.

I'm not attaching a MWE that showcases the error - I run into this error for my research dataset and haven't constructed a fake dataset that recreates this. If the above isn't clear please let me know, I'll try to come up with a MWE. I can also post screenshots of the error (which basically says log of a negative value is being taken) if it helps.

And thanks for creating and updating the package in general! Let me know if more info is needed.

Nosferican commented 2 years ago

Hi @Rubaiyat-Alam, is this something you are experiencing with the package or in your implementation of ordinal logistic regression? The key to avoiding that issue is a nice optimization trick, https://github.com/Nosferican/Econometrics.jl/blob/014636502fcc09179dbd8eab58e82c9ff07872e2/src/solvers.jl#L259 Basically, the parameter that the model optimizes is the log of the partial addition. In other words, when recovering the thresholds, you ensure that those are monotonically increasing since you are taking the cumulative addition of positive values ensured by taking the exponent of the parameter estimates.

Let me know if that helps in what you are asking.

Rubaiyat-Alam commented 2 years ago

Hi,

Thanks for the reply. It's something I'm experiencing with the package, not my own implementation. I'm attaching an image showing the error. As you can see an error seems to be thrown with regards to Optim.

ologit

Nosferican commented 2 years ago

Could you share a sample dataset that triggers the error to help me debug this?

Nosferican commented 1 year ago

I might be have a bit of time to look into it in case you can share a MRE. Thanks!