Closed Illviljan closed 2 years ago
Hi @Illviljan ,
Thanks for your interest in ruptures
!
Please feel free to propose a pull request implementing this either with numba
or Cython
. In that case, it would be great to have also a comparison of performance with native python algorithms implemented in ruptures
.
As for an accelerated version of pelt
, if you are using a cost functions that can be mapped to kernel functions (see here for available kernels in ruptures
and related cost functions), you can use the already implemented Kernel Change Point Detection methods :
Hi @Illviljan Since this is rather a discussion than an issue, I am closing this. Please, feel free to open a discussion thread on this topic in the Discussion section of the repo. Thx !
Thanks for this remark @Illviljan.
Numba accelerations are not particularly suited for change-point detection procedures, as the underlying algorithms (dynamic programming) is basically 2 nested for loops, which cannot be vectorized. For a few cost functions (in your code, you tried l2
), the inner for loop can be vectorized, but that cannot be generalized.
That is why we choose to implemenent the most used methods in C (where for loops are fast). Maybe you can compare your implementation to the KernelCPD one.
ruptures
uses a lot of (nested) for loops to detect change points and for loops in python is slow.The usual way of handling that is to vectorize as much as possible using numpy. But the algorithms in
ruptures
are quite complex with a lot of dependencies on previous loops so it isn't that easy (for me at least) to vectorize it more than it already is.Now enter numba which compiles python code to fast machine code with (almost) a single decorator, so in Pelt it would simply look like this:
Now as always it isn't quite that simple. The decorator works best if the functions are pretty much pure math/numpy only code. So no dicts, classes or other advanced python stuff. This usually just means that the heavy working functions need to have all variables in the args instead of hidden in
self
.Nice intro on numba: https://youtu.be/ewaY9CcjLt0?t=59
Here's some WIP testing code where I've gotten significantly faster results with numba: