Closed mschauer closed 10 years ago
This is pretty deeply problematic and I really don't know how to solve this problem. cc: @ViralBShah
The underlying library we use, DSFMT, is designed for generating double precision random numbers, and has only 53 bits of entropy. It is difficult to get random integers from DSFMT.
Perhaps the best thing to do is document?
A more efficient way to generate random integers from DSFMT:
We keep a cache of 256 random bits, which can be obtained by generating 5 double-precision random numbers and extracting the mantissa bits (in total 53 x 5 = 265 bits > 256 bits). Whenever the cache is used up, we refill it by generating another 5 doubles. An additional C function might be needed to make this efficient.
In this way, we can obtain four 64-bit integers using 5 doubles (currently we need 8 doubles).
I think that what we should do here is only use up 32 bits of entropy, regardless of architecture if n ≤ typemax(Int32)
and use up 64 bits of entropy otherwise. Obviously, doing rand(Int)
is still going to be platform dependent, but at least this approach allows someone to write code that will work the same on 32-bit and 64-bit machines, if they avoid that. @lindahua's performance improvement is a good idea too, but a bit unrelated, afaict.
I updated my pullrequest, what do you think.
This is of course not completely unexpected, as the
rand(1:10, 100)
returns an array ofInt
s, which are WORD_SIZE dependent, but it will bite users from time to time, e.g. #5548But on the other hand, if
length(therange) < typemax(Uint32)
it is a bit wasteful tolength(therange)
Uint64
Uint32
s.Would that justify a switch, given additional transferability between systems?