Once you have found the change points, you usually want to do some kind of operation (like mean()) on those groups of data within the change points.
It's quite common to use pandas for that (df.groupby("group_idx")). But dataframes requires the group_idx to have the same shape as the signal. But ruptures return only the index in some kind of shape.
The suggestion is therefore to instead of returning a list of indexes to simply fill a group_idx array with the change point number and return that array, for example:
[2, 5] -> [0, 0, 2, 2, 2, 5, 5]
This could potentially improve performance as it's often faster to pre-allocate arrays with the correct size and fill in values rather than repeatedly changing the size of the array (when I've been benchmarking, .append has been one of the bottlenecks).
Matlab has a nice article explaining it better than I could.
This is quite a breaking change so it probably has to be optional for a while.
Hi, thanks for the nice suggestion. Indeed, this would be a breaking change. Nevertheless we could add an optional argument to change the output format.
Once you have found the change points, you usually want to do some kind of operation (like
mean()
) on those groups of data within the change points.It's quite common to use pandas for that (
df.groupby("group_idx")
). But dataframes requires the group_idx to have the same shape as the signal. Butruptures
return only the index in some kind of shape.The suggestion is therefore to instead of returning a list of indexes to simply fill a group_idx array with the change point number and return that array, for example:
[2, 5] -> [0, 0, 2, 2, 2, 5, 5]
This could potentially improve performance as it's often faster to pre-allocate arrays with the correct size and fill in values rather than repeatedly changing the size of the array (when I've been benchmarking,
.append
has been one of the bottlenecks). Matlab has a nice article explaining it better than I could.This is quite a breaking change so it probably has to be optional for a while.
Examples of packages using group_idx style: pandas dask dataframes numpy_groupies flox