oboulant commented 2 years ago

Description

This PR fixes an issue occurring while using the signal generation feature of ruptures with a random seed. Two runs with the same random seed would not return the same generated signal.

How to reproduces

import ruptures as rpt

signal1, bkps1 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=1234)
signal2, bkps2 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=1234)

assert np.isclose(signal1, signal2).all()
assert bkps1 == bkps2

Work done

[x] Everywhere in src/ruptures/datasets/ replace the use of numpy.random.RandomState by numpy.random.default_rng
[x] Make the random functions take the seed argument from the top call
[x] adds seed optional argument to user exposed pw_constant
[x] adds seed optional argument to user exposed pw_normal
[x] adds seed optional argument to user exposed pw_linear
[x] adds seed optional argument to user exposed pw_wavy
[x] adds seed optional argument to draw_bkps
[x] adds tests to check that if seed is the same between two runs, then the generated signals are the same
[x] (Addon) adds the display extra option while building the package for the coverage computation in the CI (otherwise, the display tests are skipped because matplotlib is not installed

codecov[bot] commented 2 years ago

Codecov Report

Merging #221 (afd9e74) into master (a39f031) will increase coverage by 2.05%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #221      +/-   ##
==========================================
+ Coverage   96.71%   98.76%   +2.05%     
==========================================
  Files          40       40              
  Lines         973      972       -1     
==========================================
+ Hits          941      960      +19     
+ Misses         32       12      -20

Flag	Coverage Δ
unittests	`98.76% <100.00%> (+2.05%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/ruptures/datasets/pw_constant.py	`100.00% <100.00%> (ø)`
src/ruptures/datasets/pw_linear.py	`100.00% <100.00%> (ø)`
src/ruptures/datasets/pw_normal.py	`100.00% <100.00%> (ø)`
src/ruptures/datasets/pw_wavy.py	`100.00% <100.00%> (ø)`
src/ruptures/utils/drawbkps.py	`100.00% <100.00%> (ø)`
src/ruptures/show/display.py	`91.17% <0.00%> (+58.82%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a39f031...afd9e74. Read the comment docs.

deepcharles commented 2 years ago

The current random number generation is not well designed indeed. ~~But two consecutive rpt.pw_constant should not produce the same signal (unless specified otherwise) because it would be impossible to generate two different signals.~~ (edit: PR description was corrected.)

To me, the correct behaviour should be as follows.

Case 1 (default behaviour): no seed is specified (i.e. seed=None).

import numpy as np

import ruptures as rpt

signal1, bkps1 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1)
signal2, bkps2 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1)

Here, (signal1, bkps1) and (signal2, bkps2) should be different.

Case 2: a seed is specified.

signal1, bkps1 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=12345)
signal2, bkps2 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=12345)

Here, (signal1, bkps1) and (signal2, bkps2) should the same.

Case 3: a random generator is specified.

rng = np.random.default_rng(seed=12345)
signal1, bkps1 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=rng)
signal2, bkps2 = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=rng)

rng = np.random.default_rng(seed=12345)
signal1_bis, bkps1_bis = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=rng)
signal2_bis, bkps2_bis = rpt.pw_constant(n_samples=200, n_features=3, n_bkps=3, noise_std=1, seed=rng)

(signal1, bkps1) and (signal2, bkps2) should different but (signal1, bkps1) and (signal1_bis, bkps1_bis) should be the same, as well as (signal2, bkps2) and (signal2_bis, bkps2_bis)

deepcharles commented 2 years ago

It seems that the behaviour I described is what your code does (and not the one in the PR description). Can you change the PR description?

oboulant commented 2 years ago

It seems that the behaviour I described is what your code does (and not the one in the PR description). Can you change the PR description?

My mistake for the initial PR description, I forgot to mention that seed was initially fixed ! Sorry about that, description is fixed !

deepcharles / ruptures

fix(datasets): propagate random seed for consistency #221

Description

How to reproduces

Work done

Codecov Report