bashtage / ng-numpy-randomstate

Numpy-compatible random number generator that supports multiple core psuedo RNGs and explicitly parallel generation.
Other
45 stars 14 forks source link

Can you provide a function to generate random number from multiple streams #122

Closed jyzhang-bjtu closed 6 years ago

jyzhang-bjtu commented 6 years ago

When I do my research, I need multiple streams ( the number of streams is larger than 1000) to form a single input data. I also want the generated streams can be reused in the future. However, if I generate the multiple streams as follows : import randomstate.prng.xorshift1024 as rnds def func(N, m) rs = [rnds.RandomState(0) for _ in range(N)] a0 = np.zeros([N,m])
for i in range(N): rs[i].jump(i) a0[i] = rs[i].normal(loc=0.0, scale=1.0, size=[m])
the total time of above code for (8192,256) is 160s. The speed is too slow. For (1024,1024) is about 3s, which is acceptable. However if mt19937 is used, the speed is unacceptable.

Can you provide a function to generate random number from multiple streams. It should works likes follows. import numpy as np import randomstate.prng.mt19937 as rnds

%% four streams with seed 0 sd0 = np.zeros(4) %% ms_seed to generate four streams rs = rnds.ms_seed(sd0)

%% generate the 4X100 random numbers from four streams, %% a0[1] is from the first stream, a0[2] is from the second streams, ... and so on a0 = rs.normal_ms(size=(4,100))

It also seems that this function is should be done on the CUDA.

bashtage commented 6 years ago

mt19937 is slow and the jump operator is slow (fine for 4 streams, terrible for 1000s) -- you should use either xoroshiro128 or xorshirt1024.

I don't support CUDA. If you need CUDA you should use numba which has support for xoroshioro128 and jump. You can pregenerate the states on the CPU and then pass these to your CUDA threads. IIRC generating 10K states takes < 1S in numba.

In randomstate initializing 1000 states from xoroshiro128 takes 2.92ms.

From your example it isn't clear why you need independent streams. If you sample n*m numbers and then reshape to be (n,m) the resulting array should have independent rows and columns (statistically). The only real case for independent streams is if you are generating the random numbers in parallel, either on a cluster or say on a GPU.

jyzhang-bjtu commented 6 years ago

The case is that I use this in the deep learning. For example, I want generate an Epoch of size 1e5. The total data size, for example, is (1e5, 256), where 256 is the sample data length. The I want to shuffle the data set and iteratively learn on this data set. However, if I use the rand(1e5, 256), then I can not shuffle data and work on the same dataset.

Thanks a lot. I found xoroshiro128 is quick enough for my applications. Best

bashtage commented 6 years ago

Going to close this.