JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.55k stars 5.47k forks source link

Threaded RNG should use same stream at different locations not different streams #34386

Closed mauro3 closed 3 years ago

mauro3 commented 4 years ago

The new, thread-safe RNG (PR #32407) uses separate RNG streams (link to code) instead of using the "trick" advocated in the docs of using just one chain but sampling from different locations (implemented using Future.randjump).

Wikipedia (fourth bullet) suggests that at least for Monte-Carlo the randjump strategy should be used. Or are the seeds of the separate streams chosen such that inter-stream numbers are non-correlated too? This line suggest that no such approach is taken.

Is the randjump the right way to go about this? If so, I could do a PR.

X-refs:

Cc: @rfourquet

rfourquet commented 4 years ago

I was just wondering what exactly should be done. Shall randjump be used automatically only on creation (when retrieving default_rng() on a specific thread for the first time)? In this case, some details need to be sorted out, like which RNG to use as a reference? (given that we don't know which ones exist yet). Then, if the user calls e.g. seed!(0) in thread 1, before the creation of the global RNG rng2 in thread 2, does she expect rng2 to also be seeded (predictible) ?

Should randjump be used also with seed! (in which case seed! shouldn't be called from within every thread) ?

It seems to me that using randjump automatically forces to have either all thread rngs seeded or all not seeded. I migth be a bit confused here, but the it shows at least that the details need to be explicited :)

vtjnash commented 4 years ago

Wikipedia (fourth bullet) suggests that at least for Monte-Carlo the randjump strategy should be used. Or are the seeds of the separate streams chosen such that inter-stream numbers are non-correlated too? This line suggest that no such approach is taken.

If you look at the citation there, it actually says care must be taken that they are not initialized from a linear recurrence. Yes, one method is call randjump, which is what got published in that paper. I'm not a cryptography, but my interpretation of that information is that also just not using a linear recurrence for initialization should be statistically acceptable. Our implementation therefore lacks the provable guarantee for the other approaches (and corresponding robustness against bad seed design), but I'm thinking it must be statistically immensely improbable for two unrelated seeds to be correlated (the state space of MT is huge). But, to reveal my trick, some care must be taken in my use of the world "unrelated". Simply feeding in sequential numbers, such as you might get from another standard PRNG (including MT) does not give unrelated seeds. But we don't do that. If the current mechanism of using the kernel's randomness is discovered to be insufficient, we might be able to use a secure hash of the seed, such as AES, to improve the quality of the input. Though my real suggestion in that case might be to get a better kernel which provides a true CSPRNG, since that CS stands for cryptographically secure, and its commitment to not being a linear relation is the foundational aspect of system security for nearly everything.

This was discussed during the implementation of the current work, though I'm not sure any one source covers the entirety of the design rationale. And I've done a bit of hand-waving in my analysis based on what I'm assuming are reasonable assumptions about statistics, but am no expert, or even that well-read of an amateur.

peteroupc commented 4 years ago

For your information.

Perhaps you should consider alternatives to Mersenne Twister for the parallel PRNG, especially because it uses such a big state.

vtjnash commented 4 years ago

Yes, as described there, we use option 3 by default and assume that you don't normally want reproducible randomness. For the MT we use, the probability this results in the same seed on two threads is indistinguishable from zero (which might actually be too low, since any two true RNG should have a small chance of appearing to be correlated). I think we have other existing issues proposing changing from MT. The "big" state is tiny relative to the rest of the machine, so it's not really a convincing factor. But there may be other better PRNGs now that would be good to look into switching to, if someone provided an implementation of it.

tkf commented 4 years ago

Quoting what I mentioned in the discourse thread:

I fond what Guy Steele mentioned in the last half of this talk interesting:

https://www.youtube.com/watch?v=0hlBkQ5DjaY

IIRC there is a type of PRNG that can be split in such a way that the distribution of resulting streams combined is equivalent to the original stream (or something like that). It would be nice to look into (probably some follow up studies of):

which were mentioned in the talk.

There is a Julia implementation of counter-based PRNG: https://github.com/sunoru/Random123.jl

vtjnash commented 4 years ago

Just to say it clearly, I think those researches are generally talking about reproducable (seeded) streams. We want something a bit different (unpredictablity) as the default. So these are not contradictory, just different needs.

peteroupc commented 4 years ago

In my opinion, if unpredictability of random numbers (or information security in general) is a goal, then only a cryptographic RNG is appropriate and an application should have only one thread-safe instance of the cryptographic RNG for the whole application to access. If reproducibility is important, the API can provide a separate (perhaps parallel) PRNG for that purpose, where the user can supply a seed (see also "Implementing New RNG APIs").

tkf commented 4 years ago

My bad. This issue is not about reproducibility. My comment was off-topic.

We want something a bit different (unpredictablity) as the default.

Isn't this issue about making sure Julia has "independent" streams across threads? IIUC you don't need "unpredictability" in the sense of CSPRNG within a single thread. You just need that different threads produce independent random numbers. (Or maybe my understanding of the word "unpredictability" is not accurate.)

vtjnash commented 4 years ago

That was a good comment, I was replying as much for my own benefit as others to clarify why I don’t think that research is applicable to the default (non-seeded) case. Thus, I don’t know what this issue is about—I think it’s about asking your question to request more documentation maybe. It might be just about independence, but it seems to be mistaking independence for “unpredictability”. And yes, I deliberately picked an ambiguous word here to try to separate it from the usual statistical vocabulary and focus instead of the usage. A CSPRNG is not required within an application for statistical purposes (it is required for cryptography of course), but the CSPRGN is necessary to create seeds for the PRGN (else the values on one thread may be statistically predictable from the values seen on another thread). The independence is only broken if you use a flawed seeding approach, such as using sequentially generated numbers, such as 1,2,3 or rand(),rand(),rand(). Using either randjump or a cryptographic seed should be equally safe for the purposes of splitting a stream into uncorrelated streams. The randjump approach is fraught with complexities and difficulties, while using different cryptographic seeds is trivial. (Actually, I think the cryptographic seed is probably actually more “safe” by simple argument that it seems it would severely compromised the integrity of MT if two completely unrelated machines were more likely to be correlated than two threads on one machine).

vtjnash commented 4 years ago

Note that the CSPRNG in that case may be initialized from the PRNG, or even from known sequential numbers, so the question regarding the use of a CSPRNG for cryptographic security is fully orthogonal to the question of setting a seed for reproducibility.

mauro3 commented 4 years ago

Ok, so reading through this thread, my original question:

Or are the seeds of the separate streams chosen such that inter-stream numbers are non-correlated too?

can be answered with "yes". I wasn't aware that using different (truly) random numbers as seeds will guarantee uncorrelated streams. Edit: the reference on Wikipedia (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf) states: "We used a hypothesis that a set of PRNGs based on linear recurrences is mutually "independent" if the characteristic polynomials are relatively prime to each other." Choosing (truly) random seeds will not guarantee this relative-prime property, but presumably the likelihood of them not being relatively prime is low, right?

Documentation is needed though on how to seed threaded RNGs such that the streams are uncorrelated.

mauro3 commented 4 years ago

Although, from a UI point of view, it would be nice if only one seed would need to be set and that all the thread-local seeds would be updated consistently (and predictably).

oscardssmith commented 4 years ago

@mauro3 the chance of coprime numbers approaches pi^2/6 as the numbers get bigger

tkf commented 4 years ago

Although, from a UI point of view, it would be nice if only one seed would need to be set and that all the thread-local seeds would be updated consistently (and predictably).

This is related to my first comment. Reproducible computation using thread-local RNG is not possible in general with dynamic scheduling. Since there is no reliable way to turn-off dynamic scheduling, I don't think consistent seeding across threads has to get a high priority.

stephenll commented 4 years ago

At my company, financial/insurance simulations, we typically break up the simulations into predefined fixed "chunks", and assign a seed to each chunk. Then let the dynamic scheduler do its thing. For the most part this makes our simulations repeatable no matter the machine or number of cores. We have been doing this in c# with success.

I ported our engine to Julia hoping to get the same type of repeatability, and I did, but the randjump calc is slower than the time it takes to simulate a "chunk". Depending on the size of the model it makes sense to either not use threading at all, or now create a chunk size that is dependent on model size just so the randjump makes threading improve the run time.

Just sharing some recent observations.

StefanKarpinski commented 4 years ago

Thank you for the feedback, @stephenll. That's very helpful!

peteroupc commented 4 years ago

I ported our engine to Julia hoping to get the same type of repeatability, and I did, but the randjump calc is slower than the time it takes to simulate a "chunk".

Maybe that lies in the fact that Mersenne Twister (MT19937) has a big state compared to other modern PRNGs (about 2500 bytes compared to 16 or 32 bytes), so that jumpahead is less trivial? (By the way, different PRNGs support different strategies to support "streams" of numbers that behave like independent random numbers. For linear PRNGs, that is jumpahead. For counter-based PRNGs, that's the use of "nearby" seeds.)

stephenll commented 4 years ago

I should of mentioned that the single threaded Julia runs about the same speed as the multithreaded on 4 cores c# code I converted from. So huge win using Julia even without the threading. Less more readable code. I don’t need to rely on the c# devs to experiment or research. Can’t wait to see how the experimental threading develops.

mbauman commented 3 years ago

If I understand correctly, this is fixed/superseded by #40546.