deepcharles / ruptures

ruptures: change point detection in Python
BSD 2-Clause "Simplified" License
1.59k stars 162 forks source link

Can ruptures identify signals as no change point? #200

Closed manburst closed 2 years ago

manburst commented 2 years ago

Hello I have used your package for my research work to detect the change point but not all of the signal input has a change point here is the question.

The first figure shows the region of signal with a change point rapture can accurately detect the change point as a red label image The second figure shows the region of signal without a change point as a red label image

I have to input all of those region signals to ruptures to let them detect the change point without knowing that those region signals have a change point or not. (There are too many signals I cannot categorize them all manually)

Any setting to classify those without a change point? (Maybe report as no change point)

One more question Here is code for detecting a change point in region signal

import matplotlib.pyplot as plt
import ruptures as rpt
import numpy as np

data=[680, 655, 664, 652, 653, 696, 639, 668, 661, 694, 661, 649, 573, 597, 588, 581, 580, 578, 558, 559, 572, 579, 566, 578, 577, 566, 554, 571, 590, 571, 584, 580, 584, 577, 576, 578, 592, 581, 577, 581, 578, 595, 579, 579, 574, 580, 569, 585, 571, 565, 568, 566, 560, 571, 571, 541, 561, 561, 559, 563, 554, 550, 570, 576, 558, 574, 566, 568, 567, 582, 566, 570, 572, 565, 559, 570, 572, 567, 561, 561, 553, 563, 549, 563, 557, 557, 566, 563, 569, 550, 549, 556, 555, 539, 558, 542, 573, 564, 561, 553, 565, 554, 559, 554, 551, 549, 544, 553, 553, 540, 561, 554, 565, 553, 560, 568, 565, 568, 571, 573, 565, 534, 545, 548, 561, 551, 569, 555, 568, 570, 571, 569, 571, 566, 566, 573, 569, 575, 571, 571, 577, 559, 564, 561, 572, 565, 568, 565, 562, 556, 570, 565, 548, 566, 571, 562, 572, 559, 575, 559, 571, 567, 564, 560, 564, 549, 571, 559, 556, 549, 561, 566, 575, 566, 595, 628, 675, 683, 701, 673, 705, 706, 703, 743, 748, 787, 710, 787, 749, 790, 823, 800, 836, 833, 813, 837, 785, 811, 823, 818, 821, 815, 838, 837, 827, 822, 807, 830, 842, 840, 808, 833, 837, 814, 823, 843, 830, 800, 826, 821, 809, 807, 815, 820, 819, 802, 774, 770, 777, 775, 763, 764, 757, 759, 763, 780, 808, 809, 805, 830, 822, 822, 814, 833, 804, 823, 805, 804, 808, 815, 819, 820, 832, 796, 834, 820, 826, 835, 814, 823, 826, 826, 829, 818, 825, 809, 812, 833, 825, 823, 818, 810, 797, 765, 783, 747, 764, 750, 784, 780, 771, 785, 766, 783, 772, 774, 751, 770, 773, 781, 769, 782, 760, 746, 770, 760, 779, 774, 776, 745, 774, 798, 770, 751, 761, 791, 773, 756, 784, 784, 766, 751, 721, 761, 768, 738, 750, 759, 773, 772, 764, 771, 699, 774, 754, 750, 768, 758, 768, 768, 767, 756, 746, 733, 764, 784, 775, 759, 760, 763, 776, 768, 773, 756, 766, 746, 784, 774, 767, 746, 779, 760, 768, 735, 735, 739, 762, 779, 746, 750, 777, 760, 769, 746, 730, 774, 792, 768, 757, 734, 754, 732, 752, 736, 749, 715, 752, 737, 721, 723, 723, 764, 754, 768, 763, 744, 721, 707, 718, 725, 708, 700, 716, 690, 714, 712, 717, 720, 748, 715, 741, 710, 695, 726, 763, 717, 739, 723, 728, 696, 728, 731, 757, 723, 702, 733, 726, 702, 666, 660, 679, 662, 675, 651, 654]
signal=np.array(data)
algo = rpt.Pelt(model="rbf").fit(signal)
result = algo.predict(pen=10)
rpt.display(signal, result)

image

The result is [180, 275, 425] Why does the result show 3 change points? It is supposed to label the light red from index 180 to 425.

Thanks in advance

oboulant commented 2 years ago

Hi @manburst ,

Thx for your interest in ruptures !

I did not quite get you first question. Could you clarify ?

As for the code snippet, it actually gives you 2 change points. Indeed, as mentioned in the doc, a technical convention in ruptures ensures that the last element of the array is the number of samples.

In your code,

One solution for having less change points detected, is to increase the value of the penalty. For instance, the very same code you provided but with a penalty of 40 returns a single change point

[...]
result = algo.predict(pen=40)
print(results)
rpt.display(signal, result)

outputs

[175, 425]

Figure_1

Another solution if you know a priori the number of change points to be detected, you can use the dynamic programming search method :

import matplotlib.pyplot as plt
import ruptures as rpt
import numpy as np

data=[680, 655, 664, 652, 653, 696, 639, 668, 661, 694, 661, 649, 573, 597, 588, 581, 580, 578, 558, 559, 572, 579, 566, 578, 577, 566, 554, 571, 590, 571, 584, 580, 584, 577, 576, 578, 592, 581, 577, 581, 578, 595, 579, 579, 574, 580, 569, 585, 571, 565, 568, 566, 560, 571, 571, 541, 561, 561, 559, 563, 554, 550, 570, 576, 558, 574, 566, 568, 567, 582, 566, 570, 572, 565, 559, 570, 572, 567, 561, 561, 553, 563, 549, 563, 557, 557, 566, 563, 569, 550, 549, 556, 555, 539, 558, 542, 573, 564, 561, 553, 565, 554, 559, 554, 551, 549, 544, 553, 553, 540, 561, 554, 565, 553, 560, 568, 565, 568, 571, 573, 565, 534, 545, 548, 561, 551, 569, 555, 568, 570, 571, 569, 571, 566, 566, 573, 569, 575, 571, 571, 577, 559, 564, 561, 572, 565, 568, 565, 562, 556, 570, 565, 548, 566, 571, 562, 572, 559, 575, 559, 571, 567, 564, 560, 564, 549, 571, 559, 556, 549, 561, 566, 575, 566, 595, 628, 675, 683, 701, 673, 705, 706, 703, 743, 748, 787, 710, 787, 749, 790, 823, 800, 836, 833, 813, 837, 785, 811, 823, 818, 821, 815, 838, 837, 827, 822, 807, 830, 842, 840, 808, 833, 837, 814, 823, 843, 830, 800, 826, 821, 809, 807, 815, 820, 819, 802, 774, 770, 777, 775, 763, 764, 757, 759, 763, 780, 808, 809, 805, 830, 822, 822, 814, 833, 804, 823, 805, 804, 808, 815, 819, 820, 832, 796, 834, 820, 826, 835, 814, 823, 826, 826, 829, 818, 825, 809, 812, 833, 825, 823, 818, 810, 797, 765, 783, 747, 764, 750, 784, 780, 771, 785, 766, 783, 772, 774, 751, 770, 773, 781, 769, 782, 760, 746, 770, 760, 779, 774, 776, 745, 774, 798, 770, 751, 761, 791, 773, 756, 784, 784, 766, 751, 721, 761, 768, 738, 750, 759, 773, 772, 764, 771, 699, 774, 754, 750, 768, 758, 768, 768, 767, 756, 746, 733, 764, 784, 775, 759, 760, 763, 776, 768, 773, 756, 766, 746, 784, 774, 767, 746, 779, 760, 768, 735, 735, 739, 762, 779, 746, 750, 777, 760, 769, 746, 730, 774, 792, 768, 757, 734, 754, 732, 752, 736, 749, 715, 752, 737, 721, 723, 723, 764, 754, 768, 763, 744, 721, 707, 718, 725, 708, 700, 716, 690, 714, 712, 717, 720, 748, 715, 741, 710, 695, 726, 763, 717, 739, 723, 728, 696, 728, 731, 757, 723, 702, 733, 726, 702, 666, 660, 679, 662, 675, 651, 654]
signal=np.array(data)
algo = rpt.Dynp(model="rbf", min_size=1, jump=1).fit(signal)
result = algo.predict(n_bkps=1)
print(results)
rpt.display(signal, result)

outputs

[176, 425]

Figure_2

manburst commented 2 years ago

Thank you for your answer! To clarify the first question Here is a signal which I determine them with a change point image and here is a signal which I determine them without a change point image Those signals can categorize manually by eye which one has change point or not. The problem is I cannot manually categorize them all because there are more than 10k signals and I have to input both of them all of 10k signals to let rupture find the change point so the signal that I determine them without a change point will also get the change point which I didn't want it, I only want to find the change point of signal with a change point only.

Hope it is clear. Thanks in advance.

deepcharles commented 2 years ago

Hello,

This task amounts to finding a good penalty value. For instance, take the two following signals.

Figure_1 Figure_2

The first signal is noisy with a single change-point. The second signal is only noise. I then run the following code.

penalty_value = 10

print(rpt.Pelt(jump=1).fit(signal_1).predict(pen=penalty_value))
# Output: [50, 100]

print(rpt.Pelt(jump=1).fit(signal_2).predict(pen=penalty_value))
# Output: [100]

For a given penalty (here, 10), the first detection yields a single change point, and the second detection yields no change-point.

In your case, you have to find the right penalty value. There are two solutions.

  1. Manual calibration Try a range a values and keep the one which agrees with what you expect. I imagine that yours signals do not have the same length, so it is usually better to find a value which is proportional to log(n_samples). Something like
# let's try a value on several signals

for signal in signal_list:
    n_samples = signal.shape[0]
    penalty_value = value_to_find * np.log(n_samples)
    predicted_change_points = rpt.Pelt(jump=1).fit(signal).predict(pen=penalty_value)
    # do something to assess the validity of the prediction
    ...
  1. Automatic calibration (This is a shameless plug of a work I did a few years ago and that I intend to integrate soon in ruptures.) Instead of manually finding the best value, simply annotate a few signals, meaning that you write down the change-points you want for each signal. For the signals above, this would be [50, 100] and [100] respectively. Then run the algorithm in [1] to find the optimal penalty value. Usually it takes very few signals to learn a good penalty. If you need more information, feel free to contact me.

Hope that helps.

References

[1] Truong, C., Oudre, L., & Vayatis, N. (2017). Penalty learning for changepoint detection. Proceedings of the European Signal Processing Conference (EUSIPCO), 1569–1573. [doi] [pdf]

manburst commented 2 years ago

Thanks! that help a lot.