srand(123); rand(1:10, 100) produces different random numbers on different machines.

JuliaLang / julia

The Julia Programming Language

https://julialang.org/

MIT License

45.92k stars 5.49k forks source link

srand(123); rand(1:10, 100) produces different random numbers on different machines. #5549

Closed mschauer closed 10 years ago

mschauer commented 10 years ago

This is of course not completely unexpected, as the rand(1:10, 100) returns an array of Ints, which are WORD_SIZE dependent, but it will bite users from time to time, e.g. #5548

But on the other hand, if length(therange) < typemax(Uint32) it is a bit wasteful to

generate a number between 1 and length(therange)
by generating a Uint64
by generating two Uint32s.

Would that justify a switch, given additional transferability between systems?

StefanKarpinski commented 10 years ago

This is pretty deeply problematic and I really don't know how to solve this problem. cc: @ViralBShah

ViralBShah commented 10 years ago

The underlying library we use, DSFMT, is designed for generating double precision random numbers, and has only 53 bits of entropy. It is difficult to get random integers from DSFMT.

Perhaps the best thing to do is document?

lindahua commented 10 years ago

A more efficient way to generate random integers from DSFMT:

We keep a cache of 256 random bits, which can be obtained by generating 5 double-precision random numbers and extracting the mantissa bits (in total 53 x 5 = 265 bits > 256 bits). Whenever the cache is used up, we refill it by generating another 5 doubles. An additional C function might be needed to make this efficient.

In this way, we can obtain four 64-bit integers using 5 doubles （currently we need 8 doubles).

StefanKarpinski commented 10 years ago

I think that what we should do here is only use up 32 bits of entropy, regardless of architecture if n ≤ typemax(Int32) and use up 64 bits of entropy otherwise. Obviously, doing rand(Int) is still going to be platform dependent, but at least this approach allows someone to write code that will work the same on 32-bit and 64-bit machines, if they avoid that. @lindahua's performance improvement is a good idea too, but a bit unrelated, afaict.

mschauer commented 10 years ago

I updated my pullrequest, what do you think.