deepcharles / ruptures

ruptures: change point detection in Python
BSD 2-Clause "Simplified" License
1.59k stars 162 forks source link

fix(CostNormal): add small bias in covariance matrix #198

Closed deepcharles closed 2 years ago

deepcharles commented 3 years ago

On signals with truly constant segments, the CostNormal detection fails because the covariance matrix is badly conditioned, resulting in infinite value for the cost function. See #196.

deepcharles commented 3 years ago

I still need to add tests for the case add_small_bias=False

codecov[bot] commented 2 years ago

Codecov Report

Merging #198 (45204a6) into master (608d789) will increase coverage by 0.11%. The diff coverage is 100.00%.

:exclamation: Current head 45204a6 differs from pull request most recent head cd340cd. Consider uploading reports for the commit cd340cd to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master     #198      +/-   ##
==========================================
+ Coverage   96.59%   96.70%   +0.11%     
==========================================
  Files          40       40              
  Lines         969      972       +3     
==========================================
+ Hits          936      940       +4     
+ Misses         33       32       -1     
Impacted Files Coverage Δ
src/ruptures/exceptions.py 100.00% <ø> (ø)
src/ruptures/utils/bnode.py 100.00% <ø> (+3.84%) :arrow_up:
src/ruptures/costs/costnormal.py 100.00% <100.00%> (ø)
src/ruptures/detection/binseg.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 608d789...cd340cd. Read the comment docs.

deepcharles commented 2 years ago

In CostNormal, I have added a small diagonal matrix (epsilon*Identity, where epsilon is very small) to the covariance matrix to prevent cases when the log-determinant of the covariance matrix was infinite. The detection was failing whenever there was a truly constant segment in the signal.

This will be the default behaviour starting version 1.1.5. To prevent this new behaviour (i.e. set epsilon to 0), the user can simply do the following (change Dynp by the search method you need).

c = rpt.costs.CostNormal(add_small_diag=False)
algo = rpt.Dynp(custom_cost=c)
# or, equivalently,
algo = rpt.Dynp(model="normal", params={"add_small_diag": False})

The change has been added to the documentation.

Also, I have slightly modified the behaviour of ruptures.utils.Bnode (see 5f68193248d572c601585dcd89727aaa819fdce6) because, for some reason, the coverage decreased because of the new behaviour. I believe the if case I removed was not useful. In any case, this does not seem to modify in any way the behaviour of ruptures.BottomUp.

deepcharles commented 2 years ago

Because the new behaviour might change the detection of users, I have created a UserWarning (see 9d9b744c70e45a89445e7726a4f5bfcbd2ca74f6). This adds an annoying warning but I feel that we should warn users that something has changed whenever they use CostNormal. What do you think @oboulant ?

oboulant commented 2 years ago

I agree that users should be informed/notified in case of a change in the behaviour.

Here, the warning is emitted in all cases.

I am just wondering if we cannot limit the cases when the warning should be emit.

deepcharles commented 2 years ago

Quick question though : A user explicitly setting the value to False would not need the warning, am I right ?

Yes. I will remove the warning if the old behaviour is used.

As for your second point, I also think that the new behaviour should be the default one. Being able to correctly detect truly constant segments is a normal thing to expect by default. Normally it should not change anything but we never know, hence the warning.

deepcharles commented 2 years ago

I reverted some previous commits (in cedc408). So now, if a user specifies add_small_diag=False in CostNormal, he will get the exact same results with all search methods as before.