Open JuanuMusic opened 6 years ago
In my opinion, the bottleneck of the NN method is not in the mixture modeling (only needed if you do not provide a c
in .prune()
), but when calling t.grow()
, which loops over every event i. For each event i, it has to consider all previous events as potential nearest neighbors, essentially making this a O(n²) operation. I tried to speed up BPFunction.__call__()
and BPFunction.dist()
with numba, but it complains; that's strange because these functions only use supported methods. Maybe someone else is more successful, or has other ideas to speed up the loop.
Btw: Kudos to @NVSeismolab for making this implementation available. Highly appreciated.
Hi! I'm currently using this for a project investigating earthquakes. I just forked the project and already implemented some improvements to the code, but I think the biggest improvement would be to do the calculations based on a low level package, such as sklearn.
When running the clutering method with large data (400k items), the algorithm is really slow. Sklearn has the current model implemented: http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture
I'd be more than happy to implement it to the code, but I get a btit lost and don't really know where to plug it.
How likely is it that you would implement it or maybe guide me in the process?
Great WOrk!