NVSeismoLab / eqclustering

Python module of earthquake clustering algorithm of Zaliapin et al.
Apache License 2.0
19 stars 4 forks source link

Implement SKLearn's Gaussian Mixture Model #1

Open JuanuMusic opened 6 years ago

JuanuMusic commented 6 years ago

Hi! I'm currently using this for a project investigating earthquakes. I just forked the project and already implemented some improvements to the code, but I think the biggest improvement would be to do the calculations based on a low level package, such as sklearn.

When running the clutering method with large data (400k items), the algorithm is really slow. Sklearn has the current model implemented: http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture

I'd be more than happy to implement it to the code, but I get a btit lost and don't really know where to plug it.

How likely is it that you would implement it or maybe guide me in the process?

Great WOrk!

mherrmann3 commented 2 years ago

In my opinion, the bottleneck of the NN method is not in the mixture modeling (only needed if you do not provide a c in .prune()), but when calling t.grow(), which loops over every event i. For each event i, it has to consider all previous events as potential nearest neighbors, essentially making this a O(n²) operation. I tried to speed up BPFunction.__call__() and BPFunction.dist() with numba, but it complains; that's strange because these functions only use supported methods. Maybe someone else is more successful, or has other ideas to speed up the loop.

Btw: Kudos to @NVSeismolab for making this implementation available. Highly appreciated.