JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.11k stars 415 forks source link

Boundary parameter handling #283

Open simonbyrne opened 10 years ago

simonbyrne commented 10 years ago

How should we handle parameters which lie on the boundary? e.g.

In most cases the limits end up being Dirac measures, though in some cases there can be ambiguity (e.g. Beta(0,0)).

If we do include these, we also need to decide how to handle skewness/kurtosis: presumably either NaN or throw an error.

Update: Don't allow this for continuous distributions. Distributions that need update:

andreasnoack commented 10 years ago

As argued on the list, I don't see the problem in extending the Poisson distribution to include λ=0. I doubt that skewness/kurtosis will be used for anything in the degenerate case anyway so whether they return Inf, NaN or an error is less important to me. I think the limits (Inf) are nicest and that it is okay that degenerate distributions return different results for skewness/kurtosis depending on which distribution they degenerate from.

I'd say let's wait and see for a demand before changing Beta and Normal. It is a bit more dramatic to go from continuous to discrete so maybe is not as useful as the Poisson case.

simonbyrne commented 10 years ago

That seems like a reasonable idea. This would also match the handling of Binomial.

nalimilan commented 10 years ago

I agree with @andreasnoack. As long as the limits are clearly defined, there's no reason to raise an error.

spaceLem commented 10 years ago

As the one who brought it up, I'm for the change. It's a more useful result than an error (at least to me and other modellers who are likely to encounter it), and it makes sense in the limit lambda -> 0. Also it matches behaviour in Matlab, Octave, R, and the GSL (although not Scipy).

jiahao commented 10 years ago

The mailing list thread had a question about the limiting skewness of Poisson distributions. One can do a more careful derivation, but empirically it looks like the limit is well defined as positive infinity:

julia> for i=1:15
        P = Poisson(10.0^-i)
        println(i,"\t", skewness(P))
       end
1   3.162277660168379
2   10.0
3   31.622776601683796
4   100.0
5   316.2277660168379
6   1000.0
7   3162.277660168379
8   10000.0
9   31622.776601683792
10  99999.99999999999
11  316227.76601683797
12  1.0e6
13  3.1622776601683795e6
14  1.0e7
15  3.1622776601683795e7
johnmyleswhite commented 10 years ago

As the person who's most worried about this proposed change, I'd like to argue for making this kind of change much more systematically. What worried me about the original proposal is that it seemed to only offer a small bit of convenience at the potential expense of formal correctness in lots of other computations.

In general, I've come to strongly prefer making decisions about core packages based on sweeping principles of design that dictate how the package should behave no matter what specific case is being considered. As @simonbyrne points out, there are many other boundary cases we should consider before deciding to allow Poisson(0).

For all of those cases, we might adopt the design principle that whenever a boundary condition exists and has a clear well-defined limit, we adopt the value at the limit as the value for that boundary condition. In particular, whenever a boundary condition is equivalent to a Dirac measure, we produce outputs equivalent to those we would produce for a hypothetical Dirac measure distribution type.

If do we adopt that kind of design principle, I'd like to make sure we apply it systematically and not wait for someone to complain about inconsistencies in how we handle different distributions.

I suspect this principle could affect a lot of other distributions, including at least:

So I'd say that, if we're going to embrace boundary conditions, we should really embrace them and figure out how this design principle would impact everything in Distributions.

andreasnoack commented 10 years ago

Okay, let me try to break the filibuster attempt. If we were paid to spend all our time on Distributions I think your proposal is reasonable. However, our resources are scarce so we should try to devote them where they make most use. I don't think this allows much time spend on going through all the methods of the Gumbel distribution for a zero scale parameter.

@spaceLem proposed a small change to the Poisson distribution which would make it a bit easier to use in an application and I don't believe that the change will give problems elsewhere.

A compromise could be to extend the discrete distributions only. I think it makes sense because, as argued on the list, the change from continuous to a point measure is more dramatic and, I think, less relevant.

StefanKarpinski commented 10 years ago

Another way to look at this issue when to indicate a problem for certain values of distribution parameters. There are some values that are all around useless and should cause an error immediately. Others, like those being discussed here seem to be ok or not depending on the question one then asks about the resulting distribution object. In such cases, it seems reasonable and in line with Julia's dynamic nature to allow construction and sensible questions and defer errors to until the wrong question is actually asked. It also seems like for a lot of these questions there's an arguably correct non-finite answer.

StefanKarpinski commented 10 years ago

Also, middle ground between handling cases in ad hoc fashion and implementing it all at once, consistently: figure out a good principle and implement some cases, but don't try to deal with all of them right away.

nalimilan commented 10 years ago

That's what I was going to suggest. @johnmyleswhite criteria are good, but we can wait for actual use cases to come up before implementing them. Starting with common cases is a good strategy.

StefanKarpinski commented 10 years ago

Yeah, having a coherent policy means that any time the issue comes up, everyone knows what to do.

lindahua commented 9 years ago

I have no problem with allowing zero rate for Poisson. However, allowing zero scale for continuous distributions seems to be a more complex problem. Dealing with atomic distributions is nontrivial. How can you tell an infinite density with probability mass 1.0 from that with probability mass 0.5?

lindahua commented 9 years ago

Also, now Poisson distributions depend on Rmath. Does R support zero rate?

andreasnoack commented 9 years ago
> rpois(10, 0)
 [1] 0 0 0 0 0 0 0 0 0 0
richardreeve commented 9 years ago

I've submitted pull request #398 to fix the Poisson(0) issue.

richardreeve commented 9 years ago

The Poisson(0) issue is now fixed since pull requests #398 and #401 have been merged.

simonbyrne commented 6 years ago

I would be keen to allow Normal with std dev = 0, as it comes up fairly often.

itsdebartha commented 7 months ago

I would very much like to incorporate Geometric with the success parameter p=1. I came across a situation in a simulation study of a response-adaptive treatment allocation and encountered some error relating to zero(p) < p < one(p) when an allocation probability becomes 1. Moreover, I think including p=1 will be generalising this distribution a bit more.

Am willing to create a PR if this seems a satisfactory addition to the people here...