Closed ibab closed 8 years ago
Joining independent variables should be easy to implement in my opinion, taking inspiration from the current Mixture
class.
Defining dependent variables as in your example might be tricky however, as it starts looking more and more like defining a probabilistic graphical model, for which tools like Stan are for sure better suited.
You're right that Stan's approach of defining a probabilistic programming language is much more suited for these problems.
In practice though, I haven't managed to find a probabilistic programming tool that's suited for the kind of work I'm doing. For example, Stan can only evaluate using a single thread (and doesn't have GPU support). The assumption of single-threadedness is quite deep in many parts of the codebase, like the reverse automatic differentiation.
So I agree that a probabilistic programming language would be the best approach here, but I also like the simplicity of the scipy.stats
-like API that carl currently has.
I think @ibab is looking for functionality that we have in RooFit with a pythonic interface. RooFit is a probabilistic programming language and has several operator classes (convolution, mixture, joint product, various interpolations, ...). These of course can be seen as a probabilistic graphical model. One issue with Stan is that it is tightly coupled to the inference algorithms, while in RooFit the modeling is quite separated from the inference part (RooStats).
Within http://diana-hep.org we have a computer science PhD student that is investigating how parallel and vectorized computation of RooFit models (splitting on events, categories, sums etc.). They have experience on both distributed computing clusters and GPUs. So far they've said that this would all be much easier in something like carl.
It is very interesting to consider this alternative, but we should be very thoughtful before devoting too much time to it.
I'm actually looking for a RooFit alternative that's more heavily integrated into the Numpy/Scipy/Matplotlib ecosystem and that can be run easily on GPUs/multiple CPUs. I've been using RooFit/RooStats quite heavily in the past few years and seem to have arrived at the same conclusion as your student. Maybe I should have made clear that I'm interested in implementing the issues I opened myself.
If the focus of this package should remain on ABC, maybe we should make a new package, or I could implement some of Gilles' excellent API ideas in https://github.com/ibab/python-mle.
Hi @ibab great to hear you are interested in contributing. As I'm sure you can guess it's a sensitive matter. I was looking at your python-mle recently as well. I suggest that we try to organize a conversation with some of the interested parties.
Closing this, I added carl.distributions.Join
recently (though it still needs further improvements to be fully composable).
It should be possible to combine several distributions into a higher-dimensional one. This one would describe all random variables of the sub-distributions and represent the joint probability.
Example:
Or a very minimal syntax:
Assuming that for
p1
X.shape
is(N, a)
, and forp2
X.shape
is(N, b)
, then forp_all
X.shape
would be(N, a + b)
.In order to define a general
p(X, Y)
whereX
andY
can be correlated, you could first define the distributionp(X | Y)
and the useJoint
to combine it withp(Y)
. Maybe the.observeds_
could be used for the conditionalY
in the first distribution.An example: Assume we want to model
We could then define something like
And
p
would takedata
withdata.shape == (N, 2)
as theX
argument.