ReactiveBayes / ExponentialFamilyProjection.jl

A library to project an arbitrary function to an exponential family member distribution with the manifold optimization
MIT License
9 stars 0 forks source link

A bit strange way to implement a MLE estimator. #29

Closed Nimrais closed 1 month ago

Nimrais commented 1 month ago

I am a bit confused about how the MLE estimator is implemented. The strategy field in CVICostGradientObjective was intended for different computational strategies to compute KL divergence. However we can change the whole behavior of the CVICostGradientObjective trough it, it makes the naming confusing. The MLE objective is technically a different objective.

For example, why do we need supplementary_η to run the MLE estimator? Or why do we need rng inside the MLE strategy? It's never used inside prepare_state!.

In some ways, I see it's easy to re-use https://github.com/ReactiveBayes/ExponentialFamilyProjection.jl/blob/main/src/cvi.jl#L14, just because it's partially the same logic: compute fisher, compute cost, compute gradient, but the naming just makes what is going on inside confusing and hard to read.

bvdmitri commented 1 month ago

In general, I agree that naming can be better and the API may undergo some adjustments overall. The initial idea for projection was to project functions, however, we quickly realized that this will not be enough for RxInfer and we need a similar mechanism to find a closest member of exponential family given a set of samples. Initially @ismailsenoz implemented the MLE as a different function without the strategy parameter, but it basically was copy-paste of the existing functionality which also was far from ideal. Also it was confusing to use since now parameters are different and the function calls were different from project_to. The current implementation tries to reduce the burden on the user by hiding this and providing identical interface both for functions and for samples.

The strategy field in CVICostGradientObjective was intended for different computational strategies to compute KL divergence.

I see what you mean, but I was looking at it as a strategy to primarily compute the gradient for gradient_descent. The cost is not really being used actually (for debugging mostly).

However we can change the whole behavior of the CVICostGradientObjective trough it

The overall behavior remains unchanged; only the method for computing gradients changes. All other behavior is entirely the same.

Or why do we need rng inside the MLE strategy? It's never used inside prepare_state!.

True... but we need it to get an initial point...

why do we need supplementary_η to run the MLE estimator?

We don't need it for MLEStrategy, just as we don't need it for ControlVariateStrategy. It's an option to "shift" the natural parameters if the user knows what they are doing.

the naming just makes what is going on inside confusing and hard to read

Perhaps you are right, I don't have a problem with this naming, but we can consider renaming CVICostGradientObjective if you prefer. Maybe ProjectionCostGradientObjective? to avoid confusion.. However, the logic is very similar to what you've described: compute Fisher, compute cost, compute gradient.

In the future we can add more strategies both for samples and logpdf as arguments, but they all work similar and only define different ways to compute the cost and the gradients.

bvdmitri commented 1 month ago

An immediate good change would be to change the way we compute initial point and provide rng as an argument and remove rng from the state indeed

Nimrais commented 1 month ago

Improved in https://github.com/ReactiveBayes/ExponentialFamilyProjection.jl/pull/30.