Segmentation of repeating segments

brandon-hastings commented 2 months ago

Hi, I am currently working with some simulated data that has a baseline frequency with multiple segments of an abnormal frequency randomly placed throughout the time series. For simplicity, the abnormal frequency segments are all the same length and contain exactly the same observations, giving a sequence of ABABA. However in real data it may be something like ABACAD, where the abnormal frequency segments are similar in observations but different in segment length.

Running this data in ClaSPy I am unable to detect the change points and I was wondering if there was anything that could be done? I know from the paper this was identified as a more difficult situation. Following the advice of the paper and some of the documentation I am running with the following parameters:

distance=euclidean_distance
window_size=5
n_estimators=30

I believe that n_estimators was what n_iters was referring to in section 4.5 for setting the temporal constraint, but let me know if this is incorrect.

I am also attaching the plots of my original inputs and the clasp score for each time series below, where a green dashed line is a change point returned by ClaSPy.

Time series: mult_5_1e-15_timeseries ClaSP profiles: mult_5_1e-15_profiles

ermshaua commented 2 months ago

I suggest setting the number of segments manually (for now) and setting CP validation to None. Also try different values for the window size (maybe 10, 20, 50). Does this help?

brandon-hastings commented 2 months ago

Unfortunately not. Changing validation to None does identify more change points but not necessarily the correct change points across each time series. When specifying the number of segments manually it tends to predict the maximum number of change points allowed for the top two and under predict for the bottom one. There was no noticeable improvement with different window sizes.

ermshaua commented 2 months ago

Hmm, strange. Can you send me one of the TS to the email address in my paper? I'd like to have a look myself.

ermshaua commented 2 months ago

BinaryClaSPSegmentation(n_segments=7, validation=None) works quite well for the first two TS. I will have a deeper look at a later point in time to optimize ClaSP for such TS.

ermshaua / claspy

Segmentation of repeating segments #12