deepcharles / ruptures

ruptures: change point detection in Python
BSD 2-Clause "Simplified" License
1.54k stars 160 forks source link

Docs: Penalty term explainer #275

Open thorbjornwolf opened 1 year ago

thorbjornwolf commented 1 year ago

Super neat library! The API feels very well-designed 🤩

Reading the documentation, I miss a couple of things. One of them is a central description about what pen is, and a general strategy for setting it or getting the right order of magnitude - or reasoning why no such strategy exists.

Perhaps something like (modified from #271)

The penalty value is a positive float that controls how many changes you want (higher values yield fewer changepoints). Finding a correct value is really dependent on your situation, but as a rule of thumb pen can be initialized to around [rule of thumb] and tweaked up and down from there.

Existing work

In Binseg and sibling models there's this magic incantation:

my_bkps = algo.predict(pen=np.log(n) * dim * sigma**2)

Was it produced with some rule of thumb? In the advanced usage, kernel article, it is set twice to values 2 OOM apart:

penalty_value = 100  # beta
penalty_value = 1  # beta

The suggestion in #271 for reading an article is fine; what I lack is a paragraph or two somewhere visible. The penalty term seems important enough to be worth it.

deepcharles commented 1 year ago

Very good point. I'll the issue open to remind me to add it to the docs (shortly hopefully).

tg12 commented 8 months ago

I found this to be the rule of thumb.

  penalty_method_dict = {'SIC': p * np.log(time_series_len), 
                           'BIC': p * np.log(time_series_len), 
                           'AIC': p * 2, 
                           'Hannan-Quinn': 2 * p * np.log(np.log(time_series_len))}

@deepcharles Amazing work on the original lib! Great work.