Closed tomerk closed 8 years ago
This looks pretty good - most of my comments were on naming and comments. The unit test for GMM should be a little more robust, and I left an idea about how to do that. There's a minor but pretty crucial note about the use of +=
in there, and I'd like to take another pass with my "how could in-place ops be breaking things" hat on before we merge this.
@etrain so I think the only things left are adding a reference to the GMM paper (what is the paper?) And figuring out the naming / name spaces.
The C GMM estimator is nodes.learning.GaussianMixtureModelEstimator I'm fine moving that around and/or renaming ScalaGMMEstimator, just let me know what you'd like each to be.
Let's put the let's have a nodes.learning.external.GaussianMixtureModelEstimator
which is the C thing (and be sure to update all existing pipelines to use this) and then name the scala one just nodes.learning.GaussianMixtureModelEstimator
. We've got this convention on the featurizers as well, so it makes sense to me organizationally.
Appendix B of the following paper has guidelines for training the GMM:
Jorge Sanchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek. Image Classification with
the Fisher Vector: Theory and Practice. International Journal of Computer Vision, Springer
Verlag, 2013, 105 (3), pp.222-245. <10.1007/s11263-013-0636-x>.
Okay @etrain how's this looking?
It doesn't look like you updated the existing pipelines that use the GMM/FV nodes to point to the external namespace?
Ah, I can see where the confusion comes from. So, the pipelines never use the GaussianMixtureModelEstimator directly, they only use the GMMFisherVectorEstimator (already in external), which I have now changed to point at the external GMM Estimator. And, the ones that use FisherVector were already pointing at the external.FisherVector node.
Ah, I see - sorry, I should have verified :)
LGTM!
for issue #181, and paves the way for issue #174