deepcharles / ruptures

ruptures: change point detection in Python
BSD 2-Clause "Simplified" License
1.61k stars 163 forks source link

Multi-Model Change Point Detection #5

Open Pete-York opened 5 years ago

Pete-York commented 5 years ago

Hi!

I am hoping to use Ruptures for a project where I have a multi-dimensional or multivariate signal (I'm not 100% sure on the terminology) where I expect different models for some of the dimensions to follow different models (e.g. costs). From what I've read of the survey and looking at the current Ruptures implementation it seems like this hasn't been done currently.

I have some ideas for how to do it and have started implementing it. Is this something you would be interested in having a pull request for? Do you have any suggestions of existing work in this area I may have overlooked?

Thanks, for the package by the way, it's great!

deepcharles commented 5 years ago

Hello,

Thanks for your message. That would be a great addition to ruptures, but there are several possible implementations. For instance, do you specify beforehand which cost function is applied to which dimensions? Something equivalent to

  # say we have a 4D signal in a numpy array called "signal"

  # split the signal into two 2D signals
  signal_subdim1 = signal[:, :2]
  signal_subdim2 = signal[:, 2:]
  # detect changes in each 2D signals, with different costs
  breakpoints1 = rpt.Dynp(model="rbf").fit(signal_subdim1).predict(3)
  breakpoints2 = rpt.Dynp(model="l2").fit(signal_subdim2).predict(5)
  # merge change points 
  breakpoints = list(set(breakpoints1 + breakpoints2))

How do you imagine it?

Thanks,

Charles

Pete-York commented 5 years ago

The way I currently have it implemented is pretty straightforward, there is a cost type called CombinatorialCost, which takes a list of costs with length equal to the dimensionality of your signal.

E.g.

# say we have a 4D signal in a numpy array called "signal"
combinatorial_cost = CombinatorialCost([CostAR(), CostAR(), CostL1(), CostLinear())

algo = rpt.Pelt(min_size=5, custom_cost=combinatorial_cost).fit(signal)
result = algo.predict(pen=penalty)

To calculate the cost of a segment it simply sums the cost of each dimension according to that dimension's cost.

In terms of issues I immediately see with this approach the one that jumps out is some way of normalising cost magnitude: will different costs have different size and variance e.g. will CostL1 be very different in magnitude to CostAR() depending on the dataset.

I can make a pull request so you can see what I have so far if you like.

deepcharles commented 5 years ago

That sounds great!