Closed lindahua closed 11 years ago
Yes, these would be very helpful.
Finally, I come to the point to work on this.
I'd like to go through and ensure this method exists for every distribution. Are you opposed to changing the order of arguments to logpdf!(d, x, r)
instead of the current logpdf!(r, d, x)
?
The consideration of using logpdf!(r, d, x)
is to make it consistent with other kinds of probabilistic models that involve multiple variables (e.g. conditional distributions):
Consider a simple model as below
y ~ N(a' x, sigma)
This is a probabilistic formulation of a linear regression model. In such a model I wish to be able to write
logpdf!(r, d, x, y)
In some more generic algorithms (e.g. estimation of finite mixture models), it is nice to be able to write
logpdf!(r, d, x...)
I am actually using such syntax in a probabilistic inference package, which I am still working on.
Within the scope of this package, I think either way is fine. However, the latter way allows to enforce consistency across packages from a broader perspective.
The varargs case is a very compelling argument. Do we have any distributions currently implemented that use varags.
I am a big fan of consistency, so I'd like to clean this up. What troubles is that we already have inconsistencies: rand!(d, A)
has the mutating argument at the end, whereas logpdf!(r, d, x, ..)
has it at the front.
This issue had been addressed weeks ago. Therefore, I close this.
Ok. I do still wish we could standardize on placing the mutatable arguments to functions at the front of the argument list, but this is major change to rand!
.
I think the following is useful.
Most of the important distributions (except for Uniform distribution) are exponential family. It means that the core part in computing
logpdf
is to evaluate dot-product between parameters and the sufficient statistics. When evaluating logpdf for a set of samples, BLAS functions can be used to speed up the computation (often drastically).Currently, batch evaluation is implemented for many univariate distributions, but it is still lacking for some multivariate distributions.
Inplace evaluation is also important. In a lot of inference/estimation algorithms (e.g. EM), one has to repeatedly evaluate
logpdf
at each iteration (on the same set of samples). It would be much more efficient to put the results to a pre-allocated array, and creating a new array every time.Generally, I think we can do it in this way. Implementing a specialized method
logpdf!
for each distribution type. And, write alogpdf
on abstract distributions in the following waySimilar things can be done for discrete distributions, and we should do the same for
pdf
.