Implement SKLearn's Gaussian Mixture Model

NVSeismoLab / eqclustering

Python module of earthquake clustering algorithm of Zaliapin et al.

Apache License 2.0

19 stars 4 forks source link

Hi! I'm currently using this for a project investigating earthquakes. I just forked the project and already implemented some improvements to the code, but I think the biggest improvement would be to do the calculations based on a low level package, such as sklearn.

When running the clutering method with large data (400k items), the algorithm is really slow. Sklearn has the current model implemented: http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture

I'd be more than happy to implement it to the code, but I get a btit lost and don't really know where to plug it.

How likely is it that you would implement it or maybe guide me in the process?

Great WOrk!

In my opinion, the bottleneck of the NN method is not in the mixture modeling (only needed if you do not provide a c in .prune()), but when calling t.grow(), which loops over every event i. For each event i, it has to consider all previous events as potential nearest neighbors, essentially making this a O(n²) operation. I tried to speed up BPFunction.__call__() and BPFunction.dist() with numba, but it complains; that's strange because these functions only use supported methods. Maybe someone else is more successful, or has other ideas to speed up the loop.

Btw: Kudos to @NVSeismolab for making this implementation available. Highly appreciated.

NVSeismoLab / eqclustering

Implement SKLearn's Gaussian Mixture Model #1