Open cscherrer opened 5 years ago
Hey and thanks! This seems like a cool use case that I admittedly have never thought about before. Let me see if I have understood what it is you are trying to do:
Distribution{Particles}
Distribution{Particles}
using rand(d::Distribution{Particles})
How is this sampling expected to behave? Sample one distribution of the 1000 distributions represented by Distribution{Particles{1000}}
and then sample a value from this distribution?
I can see three ways of representing these distributions:
Particles{Particles}
Distribution{Particles}
Particles{Distribution}
Is it obvious that Distribution{Particles}
is the preferred way? I have tried to make particles work together with ForwardDiff.Dual
and have come to realize that it is not at all clear whether Particles{Dual}
or Dual{Particles}
are to be preferred, it really depends on situation and how you want to dispatch. I often had to convert between the two.
This seems like a cool use case that I admittedly have never thought about before.
Thanks!
How is this sampling expected to behave? Sample one distribution of the 1000 distributions represented by Distribution{Particles{1000}} and then sample a value from this distribution?
That's how I would expect it to behave, yes.
Is it obvious that Distribution{Particles} is the preferred way?
Not at all, Particles{Distribution} could be better. I think Particles{Particles}
is right out, because maybe other applications will be more interested in, say, logpdf
than rand
. I was thinking Distribution{Particles}
because it seemed more consistent with your approach for Particles to only wrap primitives. Didn't know about the Dual
issues at the time.
@cscherrer just a question. I was looking at your example code and tried it out and I ran into a few issues. Here was my sequence:
using MonteCarloMeasurements, Distributions
μ = Particles(1000,Normal(0,1))
σ = Particles(1000,Normal(0,1))^2
Normal(μ,σ)
where I get the error:
ERROR: Comparison operators are not well defined for uncertain values and are currently turned off. Call `unsafe_comparisons(true)` to enable comparison operators for particles using the current reduction function Statistics.mean. Change this function using `set_comparison_function(f)`.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] _comparioson_operator at /Users/brandongomes/.julia/packages/MonteCarloMeasurements/nd0Nh/src/particles.jl:344 [inlined]
[3] <=(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/MonteCarloMeasurements/nd0Nh/src/particles.jl:360
[4] >=(::Particles{Float64,1000}, ::Particles{Float64,1000}) at ./operators.jl:333
[5] macro expansion at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/utils.jl:5 [inlined]
[6] Normal{Particles{Float64,1000}}(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/univariate/continuous/normal.jl:35
[7] Normal(::Particles{Float64,1000}, ::Particles{Float64,1000}) at /Users/brandongomes/.julia/packages/Distributions/Iltex/src/univariate/continuous/normal.jl:41
[8] top-level scope at none:0
So now I continue as follows:
julia> unsafe_comparisons(true)
[ Info: Unsafe comparisons using the function `Statistics.mean` has been enabled globally. Use `@unsafe` to enable in a local expression only or `unsafe_comparisons(false)` to turn off unsafe comparisons
julia> Normal(μ,σ)
Normal{Particles{Float64,1000}}(
μ: 0.000425 ± 1.0
σ: 0.999 ± 1.4
)
So that works! Then I try the Bernoulli:
julia> Bernoulli(Particles(1000,Beta(2,3)))
Bernoulli{Particles{Float64,1000}}(
p: 0.4 ± 0.2
)
and it works just fine. If I turn unsafe_comparisons
back off, I get the same error as the normal distribution. I don't get the error you got. I was wondering if this makes sense?
@baggepinnen understands this much better then I do, but I think the issue is how to compute a<b
when a
and b
are Particles
:
julia> unsafe_comparisons(true)
[ Info: Unsafe comparisons using the function `Statistics.mean` has been enabled globally. Use `@unsafe` to enable in a local expression only or `unsafe_comparisons(false)` to turn off unsafe comparisons
julia> Particles()<Particles()
false
julia> unsafe_comparisons(false)
julia> for f in [<=, >=, <, >]
register_primitive(f)
end
julia> Particles()<Particles()
Part500(0.494 ± 0.5)
unsafe_comparisons
just does one comparison, using the means. For particle-wise computations, you need to register_primitive
Oh, I see, Bernoulli
breaks because the particles are coerced to be floating point. @baggepinnen, is it reasonable to have Particles
of Bool, Int, etc?
Welcome @bhgomes . Your error message is different (hopefully improved) since you use a newer version of the package. @cscherrer is correct in asserting that the error is thrown when particles appear in comparisons, e.g., p<0
, as it is not always clear how you want this comparison to be handled. I called the function to allow comparisons unsafe_comparisons
to emphasize that one should really think through if this is what one really wants. My long-term plan is to write a compiler pass that identifies when Particles{Bool}
appear in a boolean context and transforms this code to a loop over a function containing the control flow, but this is above my head at the moment, and register_primitive
will have to do for now, with limitations.
It is certainly reasonable to have Particles{Bool/Int}
etc. in correct situation, e.g., to represent binary distributions or histograms.
The problem with Distribution{Particles}
vs Particles{Distribution}
is that one of the types behaves (dispatches) as a Distribution
whereas the other behaves as Particles
. Ideally, you would like the composite type to behave as both a distribution and as particles, at least in some situations, but this requires you to either convert between the two representations or implement quite a lot of new methods.
The easiest way forward, i suspect, is to identify which of the types Particles/Distribution
have more special methods implemented for it, and let this type be the outermost. I have a feeling this leads to Particles{Distribution}
. I do however believe that the representation as Distribution{Particles}
is easier to reason about and sample from. Indeed, I can't even imagine how a sample would be drawn from Particles{Distribution}
.
I'm struggling with coming up with a good solution to this. I think the following does what it's supposed to, but is horribly inefficient
import MonteCarloMeasurements.particletype
struct ParticleDistribution{D}
d::D
end
function ParticleDistribution(d::Type{<:Distribution}, p...)
@unsafe ParticleDistribution(d(p...))
end
particletype(pd::ParticleDistribution) = particletype(getfield(pd.d,1))
function Base.rand(d::ParticleDistribution{D}) where D
T,N = particletype(d)
i = rand(1:N)
d2 = MonteCarloMeasurements.replace_particles(d.d,P->P isa AbstractParticles, P->P[i])
rand(d2)
end
pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) # 822 ns
@btime rand(Bernoulli(0.3)) # 4 ns
It seems that relative to rand(Bernoulli(0.3))
, rand($pd)
does 1000x the work in 200x the time, is that right? I'm missing how this shows inefficiency. But it's early here, so maybe I just need coffee :)
It doesn't really do 1000x the work, it just draws one random number, selects that distribution and then draws a random number from that, so two random numbers in total. The inefficiency comes from replace_particles
which does a lot of runtime code introspection, which is a problem I have been struggling to get around. One potential solution is to use the method linked in https://github.com/baggepinnen/MonteCarloMeasurements.jl/issues/21
Are the isa
calls in replace_particles
all compiled away? If not, would it be faster to turn each of these into a different method?
Or if generated functions would help, I've had good luck with GeneralizedGenerated.jl, which gives a lot more flexibility than built-in @generated
Some progress
using MonteCarloMeasurements, Distributions
import MonteCarloMeasurements.particletypetuple
struct ParticleDistribution{D,P}
d::D
constructor::P
end
function ParticleDistribution(constructor::Type{<:Distribution}, p...)
@unsafe ParticleDistribution(constructor(p...), constructor)
end
particletypetuple(pd::ParticleDistribution) = particletypetuple(getfield(pd.d,1))
particletypetuple(::Type{D}) where D <: ParticleDistribution = particletypetuple(getfield(pd.d,1))
@generated function Base.rand(d::ParticleDistribution{D}) where D
nfields = fieldcount(D)
indtuple = ()
N = 0
for field in fieldnames(D)
FT = fieldtype(D, field)
if FT <: AbstractParticles
N = particletypetuple(FT)[2]
indtuple = (indtuple..., :(d.d.$field[i]))
else
indtuple = (indtuple..., :(d.d.$field))
end
end
tupleex = Expr(:tuple, indtuple...)
quote
i = rand(1:$N)
d2 = d.constructor($(tupleex)...)
# d2 = newstruct(d.constructor{typeof.($(tupleex))...}, $(tupleex)...) # This can be used to bypas a missing inner constructor, is a bit slower though
rand(d2)
end
end
pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) # 194.651 ns (2 allocations: 32 bytes)
@btime rand(Bernoulli(0.3)) # 10.050 ns (0 allocations: 0 bytes)
pd = ParticleDistribution(Normal, Particles(1000,Normal(10,3)), Particles(1000,Normal(2,0.1)))
@btime rand($pd) # 149.837 ns (4 allocations: 80 bytes)
@btime rand(Normal(10,2)) # 12.788 ns (0 allocations: 0 bytes)
Some assumptions made:
d.d
and no deeper.newstruct
from https://github.com/JuliaIO/BSON.jl/blob/95dc736ec99dcd6bbf1abb59fdefece85a9bc3c7/src/extensions.jl#L104I have been thinking more about this and might have changed my mind; Particles
are a subtype of Real
, and contains a collection of Real
samples such that each sample on its own is useful and can be propagated through the same functions as Particles
can be applied to. If a hierarchical distribution was represented by a collection of individual distributions, we would have the same situation, i.e., Particles{Distribution} <: Distribution
. Each sample would itself be a distribution, just like a sample of Particles{Real}
is a Real
. This would make sampling a random number from the distribution very easy, just sample an index and then sample from that distribution. I am not sure how the calculation of logpdf
would be, just average all sample logpdf
s?
Where the reverse representation, i.e., Distribution{Particles}
comes in handy is during construction and display, it is very convenient to construct a hierarchical distribution by thinking about the outer distribution's parameters in terms of particles. This can be facilitated with special constructors though, and is decoupled from the internal representation.
Is there any situation in which computation is made easier by the representation as Distribution{Particles}
like the example in my last post?
Note to self: It's quite possible that StructArrays.jl can significantly simplify how we deal with the Particles{Distribution} / Distribution{Particles}
dilemma.
This is starting to become useful now. With the code below, particle distributions are represented as a list of regular distributions, but are constructed and printed as if they were Distributions{Particles}
. Drawing random numbers from such a distribution is very fast, it costs roughly the same as drawing one integer and one number from the underlying distribution.
This gets around the problem with distributions performing arg checks inside their constructors since the constructor is called with normal scalars.
@cscherrer How would you envision calculating the logpdf
for such a distribution? I am not sure which one of
logpdf(pd,x) = mean(logpdf(d,x) for d in pd.d)
pdf(pd,x) = mean(pdf(d,x) for d in pd.d)
is correct.
using MonteCarloMeasurements, Distributions
import MonteCarloMeasurements: nparticles, indexof_particles
struct ParticleDistribution{D,P}
d::D
constructor::P
end
function ParticleDistribution(constructor::Type{<:Distribution}, p...)
dists = [constructor(getindex.(p, i)...) for i in 1:nparticles(p[1])]
ParticleDistribution(dists, constructor)
end
function Base.rand(d::ParticleDistribution{D}) where D
ind = rand(1:length(d.d))
rand(d.d[ind])
end
function Base.show(io::IO, d::ParticleDistribution)
T = eltype(d.d)
fields = map(fieldnames(T)) do fn
getfield.(d.d, fn)
end
println(io, "Particle", T, "(")
for (i,fn) in enumerate(fieldnames(T))
println(io, string(fn), ": ", Particles(fields[i]))
end
println(io, ")")
end
pd = ParticleDistribution(Bernoulli, Particles(1000,Beta(2,3)))
@btime rand($pd) # 23.304 ns (0 allocations: 0 bytes)
@btime rand(Bernoulli(0.3)) # 10.050 ns (0 allocations: 0 bytes)
pd = ParticleDistribution(Normal, Particles(1000,Normal(10,3)), Particles(1000,Normal(2,0.1)))
@btime rand($pd) # 27.726 ns (0 allocations: 0 bytes)
@btime rand(Normal(10,2)) # 12.788 ns (0 allocations: 0 bytes)
julia> pd
ParticleNormal{Float64}(
μ: 10.0 ± 3.0
σ: 2.0 ± 0.1
)
This looks great! I wonder, how far can "particle semantics" be pushed? For the last pd
in your example, could rand(pd)
return a Particles
by drawing one sample from each ParticleDistribution
?
Pretend ParticleDistribution
had a .particles
field (guess getproperty
could even make this a reality). Then something like
rand(pd) = Particles(rand.(pd.particles))
logpdf(pd, x) = Particles(logpdf.(pd.particles, x))
There may be a more efficient implementation, but would this work? At what point do the semantics break down?
I'm sure it can be made to work, but I'm struggling to understand what it would represent.
logpdf
, how would one use this particle representation of the logpdf? Distribution
from the collection.I am very open to be convinced of its utility, it feels like hitherto untrodden land :)
In a typical MCM use case, say you have some function f(x::Real)
that returns another Real
. If you give that function a Particles
, you might expect it to return another Particles
, which you can use to propagate uncertainty.
Now what if f
looks like, say
function f(x)
return rand(Normal(x,1))
end
This fits into the same framework, the only difference is that now there's uncertainty other than from the particles themselves.
This comes up all the time in probabilistic programming. We have uncertainty on our parameter estimates (because they're samples from the posterior distribution), which we propagate through the observation model for forecasting or posterior predictive checks.
I buy that! This branch now follows the semantics you outlined, i.e.
rand(pd) = Particles(rand.(pd.particles))
logpdf(pd, x) = Particles(logpdf.(pd.particles, x))
It appears to be reasonably efficient to use the naive (but type stable) way of drawing random numbers.
julia> pd = ParticleDistribution(Normal, 1±0.1, 1±0.1)
ParticleNormal{Float64}(
μ: 1.0 ± 0.1
σ: 1.0 ± 0.1
)
julia> rand(pd)
Part10000(1.012 ± 1.01)
julia> @btime rand($pd)
116.768 μs (3 allocations: 78.22 KiB)
Part10000(0.996 ± 0.997)
julia> @btime 1 .+ 0.1 .* randn(10000);
89.188 μs (4 allocations: 156.41 KiB)
For some cases there are some nice optimizations, e.g.
rand(ParticleDistribution(Normal, m, s)) == m + s * Particles(Normal())
(assuming I got that right)
With the current representation we have an array of structs.
m + s * Particles(Normal())
implies the struct of arrays representation.
Struct of arrays is often (but not always) faster, but also harder to use. With the struct of arrays (Distribution{Particles}
), we are back at the problems with Distribution constructors etc. and that special methods would have to be implemented for each subtype of Distribution
, alternatively use the slower fallback outlined in https://github.com/baggepinnen/MonteCarloMeasurements.jl/issues/22#issuecomment-578967057.
One could also store both representations inside a ParticleDistribution
and use whichever is most efficient for a particular task, at the expense of a 2x memory overhead.
The naive sampling outlined in my previous post is actually nearly just as fast as the struct-of-arrays version, Julia must do a very good job optimizing the broadcasting of rand
in Particles(rand.(rng, d.d))
julia> @btime rand($pd);
119.941 μs (3 allocations: 78.22 KiB)
julia> @btime $m + $s*Particles{Float64, 10000}(randn(10000));
101.810 μs (9 allocations: 234.66 KiB)
That's really close! I'd guess the latter might be a little more stable, since it takes advantage of stratified sampling. But I'm not sure about that.
You mentioned https://github.com/JuliaArrays/StructArrays.jl, did that seem to fit here?
It might be a bit more stable thanks to the systematic sampling, but that is significantly slower
julia> @btime $m + $s*Particles{Float64, 10000}(systematic_sample(10000, Normal()));
535.076 μs (13 allocations: 391.06 KiB)
mostly due to quantile
being slower than randn
julia> @btime quantile(Normal(), 0.6);
18.838 ns (0 allocations: 0 bytes)
julia> 0.018*10000
180.0
but also since the particles after systematic sampling have to be permuted.
Oh interesting, I thought Particles()
used systematic sampling by default. Think I'm misunderstanding something
No you are correct, it's just in my previous benchmark I bypassed that by calling randn manually, so I made the systematic sampling explicit so it would be clearer what's going on.
Really enjoying this package, thank you for your work on it.
For Soss.jl, here's a simple model:
For simple examples like this, I can now use
MonteCarloMeasurements
to getThe way I get there... it's not so pretty. Let's start with μ and σ:
Surprisingly, this works just fine:
Now trying the obvious thing,
It thinks it works, but it's really horribly broken:
So I got it to work with a little helper function:
So now I have
Much better.
This is fine for one distribution, but most don't compose as nicely as a normal.
Many distributions break entirely, seemingly because of argument checking in
Distributions.jl
:Do you see a path toward...
rand
to work properly?