biocore / biom-format

The Biological Observation Matrix (BIOM) Format Project
http://biom-format.org
Other
89 stars 95 forks source link

Explicitly allow setting a random seed for subsample #916

Closed wasade closed 1 year ago

wasade commented 1 year ago

Fixes #914. You can now specify the random seed on call to Table.subsample(...).

cc @gibsramen @rob-knight

gibsramen commented 1 year ago

Thanks, @wasade. How much work would it be to rewrite the random shuffling into the new NumPy random API? Setting seed with np.random.seed is now a legacy function that sets the global random seed which can have some unintended consequences. The new RNG documentation is here.

wasade commented 1 year ago

Good observation. Likely easy given the methods are there but the signature for the underlying cython method will need a minor adjustment

>>> from numpy.random import default_rng
>>> rng = default_rng(12345)
>>> rng.multinomial
<built-in method multinomial of numpy.random._generator.Generator object at 0x7f9c385c1900>
>>> rng.permutation
<built-in method permutation of numpy.random._generator.Generator object at 0x7f9c385c1900>
>>> 
wasade commented 1 year ago

@gibsramen if this is green can you merge?

gibsramen commented 1 year ago

Looks good, thanks! I think this is good to merge.

wasade commented 1 year ago

Thanks!