Open simonbyrne opened 10 years ago
As argued on the list, I don't see the problem in extending the Poisson distribution to include λ=0
. I doubt that skewness/kurtosis
will be used for anything in the degenerate case anyway so whether they return Inf
, NaN
or an error is less important to me. I think the limits (Inf
) are nicest and that it is okay that degenerate distributions return different results for skewness/kurtosis
depending on which distribution they degenerate from.
I'd say let's wait and see for a demand before changing Beta
and Normal
. It is a bit more dramatic to go from continuous to discrete so maybe is not as useful as the Poisson
case.
That seems like a reasonable idea. This would also match the handling of Binomial
.
I agree with @andreasnoack. As long as the limits are clearly defined, there's no reason to raise an error.
As the one who brought it up, I'm for the change. It's a more useful result than an error (at least to me and other modellers who are likely to encounter it), and it makes sense in the limit lambda -> 0
. Also it matches behaviour in Matlab, Octave, R, and the GSL (although not Scipy).
The mailing list thread had a question about the limiting skewness of Poisson distributions. One can do a more careful derivation, but empirically it looks like the limit is well defined as positive infinity:
julia> for i=1:15
P = Poisson(10.0^-i)
println(i,"\t", skewness(P))
end
1 3.162277660168379
2 10.0
3 31.622776601683796
4 100.0
5 316.2277660168379
6 1000.0
7 3162.277660168379
8 10000.0
9 31622.776601683792
10 99999.99999999999
11 316227.76601683797
12 1.0e6
13 3.1622776601683795e6
14 1.0e7
15 3.1622776601683795e7
As the person who's most worried about this proposed change, I'd like to argue for making this kind of change much more systematically. What worried me about the original proposal is that it seemed to only offer a small bit of convenience at the potential expense of formal correctness in lots of other computations.
In general, I've come to strongly prefer making decisions about core packages based on sweeping principles of design that dictate how the package should behave no matter what specific case is being considered. As @simonbyrne points out, there are many other boundary cases we should consider before deciding to allow Poisson(0)
.
For all of those cases, we might adopt the design principle that whenever a boundary condition exists and has a clear well-defined limit, we adopt the value at the limit as the value for that boundary condition. In particular, whenever a boundary condition is equivalent to a Dirac measure, we produce outputs equivalent to those we would produce for a hypothetical Dirac measure distribution type.
If do we adopt that kind of design principle, I'd like to make sure we apply it systematically and not wait for someone to complain about inconsistencies in how we handle different distributions.
I suspect this principle could affect a lot of other distributions, including at least:
So I'd say that, if we're going to embrace boundary conditions, we should really embrace them and figure out how this design principle would impact everything in Distributions.
Okay, let me try to break the filibuster attempt. If we were paid to spend all our time on Distributions
I think your proposal is reasonable. However, our resources are scarce so we should try to devote them where they make most use. I don't think this allows much time spend on going through all the methods of the Gumbel distribution for a zero scale parameter.
@spaceLem proposed a small change to the Poisson distribution which would make it a bit easier to use in an application and I don't believe that the change will give problems elsewhere.
A compromise could be to extend the discrete distributions only. I think it makes sense because, as argued on the list, the change from continuous to a point measure is more dramatic and, I think, less relevant.
Another way to look at this issue when to indicate a problem for certain values of distribution parameters. There are some values that are all around useless and should cause an error immediately. Others, like those being discussed here seem to be ok or not depending on the question one then asks about the resulting distribution object. In such cases, it seems reasonable and in line with Julia's dynamic nature to allow construction and sensible questions and defer errors to until the wrong question is actually asked. It also seems like for a lot of these questions there's an arguably correct non-finite answer.
Also, middle ground between handling cases in ad hoc fashion and implementing it all at once, consistently: figure out a good principle and implement some cases, but don't try to deal with all of them right away.
That's what I was going to suggest. @johnmyleswhite criteria are good, but we can wait for actual use cases to come up before implementing them. Starting with common cases is a good strategy.
Yeah, having a coherent policy means that any time the issue comes up, everyone knows what to do.
I have no problem with allowing zero rate for Poisson. However, allowing zero scale for continuous distributions seems to be a more complex problem. Dealing with atomic distributions is nontrivial. How can you tell an infinite density with probability mass 1.0 from that with probability mass 0.5?
Also, now Poisson distributions depend on Rmath. Does R support zero rate?
> rpois(10, 0)
[1] 0 0 0 0 0 0 0 0 0 0
I've submitted pull request #398 to fix the Poisson(0) issue.
The Poisson(0) issue is now fixed since pull requests #398 and #401 have been merged.
I would be keen to allow Normal
with std dev = 0, as it comes up fairly often.
I would very much like to incorporate Geometric
with the success parameter p=1
. I came across a situation in a simulation study of a response-adaptive treatment allocation and encountered some error relating to zero(p) < p < one(p)
when an allocation probability becomes 1
. Moreover, I think including p=1
will be generalising this distribution a bit more.
Am willing to create a PR if this seems a satisfactory addition to the people here...
How should we handle parameters which lie on the boundary? e.g.
Poisson(0)
(see original discussion) FIXED!Normal(0,0)
Beta(a,0)
andBeta(0,b)
fora,b > 0
.In most cases the limits end up being Dirac measures, though in some cases there can be ambiguity (e.g.
Beta(0,0)
).If we do include these, we also need to decide how to handle
skewness
/kurtosis
: presumably eitherNaN
or throw an error.Update: Don't allow this for continuous distributions. Distributions that need update:
Poisson
Geometric
NegativeBinomial