deepcharles / ruptures

ruptures: change point detection in Python
BSD 2-Clause "Simplified" License
1.6k stars 163 forks source link

fix(datasets): propagate random seed for consistency #221

Closed oboulant closed 2 years ago

oboulant commented 2 years ago

Description

This PR fixes an issue occurring while using the signal generation feature of ruptures with a random seed. Two runs with the same random seed would not return the same generated signal.

How to reproduces

import ruptures as rpt

signal1, bkps1 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=1234)
signal2, bkps2 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=1234)

assert np.isclose(signal1, signal2).all()
assert bkps1 == bkps2

Work done

codecov[bot] commented 2 years ago

Codecov Report

Merging #221 (afd9e74) into master (a39f031) will increase coverage by 2.05%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #221      +/-   ##
==========================================
+ Coverage   96.71%   98.76%   +2.05%     
==========================================
  Files          40       40              
  Lines         973      972       -1     
==========================================
+ Hits          941      960      +19     
+ Misses         32       12      -20     
Flag Coverage Δ
unittests 98.76% <100.00%> (+2.05%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/ruptures/datasets/pw_constant.py 100.00% <100.00%> (ø)
src/ruptures/datasets/pw_linear.py 100.00% <100.00%> (ø)
src/ruptures/datasets/pw_normal.py 100.00% <100.00%> (ø)
src/ruptures/datasets/pw_wavy.py 100.00% <100.00%> (ø)
src/ruptures/utils/drawbkps.py 100.00% <100.00%> (ø)
src/ruptures/show/display.py 91.17% <0.00%> (+58.82%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a39f031...afd9e74. Read the comment docs.

deepcharles commented 2 years ago

The current random number generation is not well designed indeed. But two consecutive rpt.pw_constant should not produce the same signal (unless specified otherwise) because it would be impossible to generate two different signals. (edit: PR description was corrected.)

To me, the correct behaviour should be as follows.

Case 1 (default behaviour): no seed is specified (i.e. seed=None).

import numpy as np

import ruptures as rpt

signal1, bkps1 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1)
signal2, bkps2 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1)

Here, (signal1, bkps1) and (signal2, bkps2) should be different.

Case 2: a seed is specified.

signal1, bkps1 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=12345)
signal2, bkps2 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=12345)

Here, (signal1, bkps1) and (signal2, bkps2) should the same.

Case 3: a random generator is specified.

rng = np.random.default_rng(seed=12345)
signal1, bkps1 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=rng)
signal2, bkps2 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=rng)

rng = np.random.default_rng(seed=12345)
signal1_bis, bkps1_bis = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=rng)
signal2_bis, bkps2_bis = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=rng)

(signal1, bkps1) and (signal2, bkps2) should different but (signal1, bkps1) and (signal1_bis, bkps1_bis) should be the same, as well as (signal2, bkps2) and (signal2_bis, bkps2_bis)

deepcharles commented 2 years ago

It seems that the behaviour I described is what your code does (and not the one in the PR description). Can you change the PR description?

oboulant commented 2 years ago

It seems that the behaviour I described is what your code does (and not the one in the PR description). Can you change the PR description?

My mistake for the initial PR description, I forgot to mention that seed was initially fixed ! Sorry about that, description is fixed !