deepcharles / ruptures

ruptures: change point detection in Python
BSD 2-Clause "Simplified" License
1.56k stars 161 forks source link

binseg incorrect for normal model with 8 data points #244

Closed tdhock closed 2 years ago

tdhock commented 2 years ago

hi again @deepcharles I tried computing a normal model with binary segmentation but I sometimes observe that the predict method returns less than the specified number of changepoints. For example when I use predict(n_bkps=3) I expect a list of 4 segment ends, but I only observe 2. Is this a bug? Here is a simple example:

import ruptures as rpt
import numpy as np
rpt.version.version
data_list = [0,0.3,0.2,0.1, 10,11,12,13]
N_data = len(data_list)
data_mat = np.array(data_list).reshape(N_data,1)
algo = rpt.Binseg(model="normal",jump=1).fit(data_mat)
computed_break_dict={n_bkps:algo.predict(n_bkps=n_bkps) for n_bkps in range(4)}
computed_break_dict
expected_break_dict = {
    0:[8],
    1:[4,8],
    2:[4,6,8],
    3:[2,4,6,8]
    }

Output on my system:

Python 3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:11:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ruptures as rpt
>>> import numpy as np
>>> rpt.version.version
'1.1.6'
>>> data_list = [0,0.3,0.2,0.1, 10,11,12,13]
>>> N_data = len(data_list)
>>> data_mat = np.array(data_list).reshape(N_data,1)
>>> algo = rpt.Binseg(model="normal",jump=1).fit(data_mat)
c:\Users\th798\miniconda3\envs\cs570s22\lib\site-packages\ruptures\costs\costnormal.py:32: UserWarning: New behaviour in v1.1.5: a small bias is added to the covariance matrix to cope with truly constant segments (see PR#198).
  UserWarning,
>>> computed_break_dict={n_bkps:algo.predict(n_bkps=n_bkps) for n_bkps in range(4)}
>>> computed_break_dict
{0: [8], 1: [4, 8], 2: [4, 8], 3: [4, 8]}
>>> expected_break_dict = {
...     0:[8],
...     1:[4,8],
...     2:[4,6,8],
...     3:[2,4,6,8]
...     }
deepcharles commented 2 years ago

Thanks. We'll look into it