Closed rasbt closed 8 years ago
This is not really the recommended way, right? Would you do it like this? That can be tricky, I think.
You mean in contrast to sth like this?
>>> import numpy as np
>>> rndst = np.random.RandomState(1234)
Not sure, but this only works for e.g., randint and some others, right?
>>> rndst.randint(3)
2
>>> rndst.random((3, 5))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'mtrand.RandomState' object has no attribute 'random'
Hm, isn't np.random.RandomState(seed=1234)
and np.random.seed(1234
) essentially the same?
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html
well random.seed
changes a global (private?) random state, while random.RandomState
is an explicit object that you can pass around. Both depend on execution order, but I feel with the object it is more explicit.
It works with all distributions, but not all aliases. random
is an alias for random_state
. random
is also just a special case of uniform
, right? I tend to use uniform
because that's more explicit to me.
Both depend on execution order, but I feel with the object it is more explicit.
I agree, will swap it out later when I get home!
thanks :)
Just updated the RandomState! General question (related to the third notebook), the dataset that is available via load_digits
, where's it coming from? (Is it a lower-resolution subset of MNIST?) -- I think someone at the tutorials will likely ask ;)
It's unrelated to MNIST, I think, but also collected by NIST. The DESCR
attribute should say:
Notes
-----
Data Set Characteristics:
:Number of Instances: 5620
:Number of Attributes: 64
:Attribute Information: 8x8 image of integer pixels in the range 0..16.
:Missing Attribute Values: None
:Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
:Date: July; 1998
This is a copy of the test set of the UCI ML hand-written digits datasets
http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.
Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each block. This generates
an input matrix of 8x8 where each element is an integer in the range
0..16. This reduces dimensionality and gives invariance to small
distortions.
For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.
T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.
L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,
1994.
Thanks, I dunno why I didn't just check the DESCR :/. (On a side note: it would maybe be useful to add the desc .rst files somehow to the function docstrings so that they also appear in the API doc online?)
Btw I can merge the changing once in a while so that you don't lose the overview here ;). Hehe, going through notebook 01.4, I must say that using the random_state=1999 (to get the 0.33 proportion in the iris test/train split) was a tad sneaky :); I changed it using the new stratify=y
option.
The DESCR should be in the user guide, but it looks like it is not. We should probably fix that. And yeah, feel free to merge. I'm a bit caught up still in my book stuff.
Okay, maybe we should open an issue then. No worries, take you time; I also want to get through all the notebook this weekend hopefully so that I can tackle the other things we discusses (presentation figures, the linear regression implementation, etc.) I will merge the changes then for now!
sweet lgtm. I'd probably use
np.random.RandomState(seed=1234)
for reproducibility.