JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.1k stars 413 forks source link

Cauchy distribution: biased estimator of scale #1570

Open nalewkoz opened 2 years ago

nalewkoz commented 2 years ago

The currently available code for fitting parameters of the univariate Cauchy distribution is not based on the MLE and gives a biased estimate of scale. I wrote my own code that iteratively solves a set of nonlinear equations for the MLE and as far as I understand this gives an unbiased estimator. Could this be useful? Should I prepare a pull request?

ParadaCarleton commented 11 months ago

I'm a bit confused as to what you mean here. fit should always default to MLE (if it doesn't, that's a bug), but isn't the MLE for the Cauchy variance slightly biased?

devmotion commented 11 months ago

fit should always default to MLE (if it doesn't, that's a bug),

No, that's not correct. Usually it's MLE but it does not have to be (stated also in the docs: https://juliastats.org/Distributions.jl/latest/fit/).

ParadaCarleton commented 11 months ago

No, that's not correct. Usually it's MLE but it does not have to be (stated also in the docs: https://juliastats.org/Distributions.jl/latest/fit/).

Sorry, I meant it should default to fit_mle unless another fit function exists.

But I don't see another fit function documented for Cauchy; aren't the docs supposed to say if another method is defined for fit?

nalewkoz commented 11 months ago

I am not really sure if the joint MLE of location and scale of the Cauchy distribution is biased. From my numerical experiments it looks like the bias of the scale estimator, if any, is tiny -- definitely much lower than the bias of the method currently implemented in fit. That said, the MLE is also more computationally demanding, so it is probably a good idea not to use the MLE as default here.

ParadaCarleton commented 11 months ago

I am not really sure if the joint MLE of location and scale of the Cauchy distribution is biased. From my numerical experiments it looks like the bias of the scale estimator, if any, is tiny -- definitely much lower than the bias of the method currently implemented in fit. That said, the MLE is also more computationally demanding, so it is probably a good idea not to use the MLE as default here.

Oh, I'm sure the bias is small and a lot lower, I was just mentioning since it's not technically unbiased. I think MLE would be a good default here--it's more computationally demanding, but on modern computers it's still not much unless you're working with truly massive datasets (in which case you're probably not just using the off-the-shelf fit method).