golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.17k stars 17.46k forks source link

math/rand/v2: revised API for math/rand #61716

Closed rsc closed 8 months ago

rsc commented 1 year ago

Based on earlier discussions in #60751, #26263, and #21835, as well as discussions with @robpike, I propose adding a new version of math/rand, imported as math/rand/v2, to the standard library. The full text is below but is nearly identical to what we discussed in #60751.

The most direct motivation for this proposal is to clean up math/rand and fix these many lingering problems, especially the use of an outdated generator, slow algorithms, and the unfortunate collision with crypto/rand.Read.

A more indirect but equally important motivation is to set an example for other v2 packages in the standard library. Creating math/rand/v2 will let us work out tooling issues (support for v2 packages in gopls, goimports, and so on) in a relatively rarely used package with fairly low stakes attached before moving on to more commonly used, higher-stakes packages like sync/v2 or encoding/json/v2.

This proposal establishes the pattern for v2 packages in the standard library, namely that they are subdirectories of the original package and that the starting point for the API is the original package, with each deviation clearly justified. In particular, v2 packages are not an opportunity to redesign something from scratch "just because". The existing APIs work well, and deviations should be carefully considered. A from-scratch new API is just as likely (probably more likely) to introduce new problems as it is to fix existing ones.

Once math/rand/v2 has shipped, we would tag and delete x/exp/rand. This would keep programs that already use x/exp/rand working (by reference to the tagged version or older versions of x/exp) but allow us to delete the code from the main branch of x/exp, making clear that development on it has ended.

A prototype of these math/rand/v2 changes can be found at https://go.dev/cl/502506 and the CLs below it in the stack.

Proposal

The math/rand/v2 API would use math/rand as a starting point and then make the following backwards incompatible changes:

  1. Remove Rand.Read and top-level Read. It is almost always a mistake to pretend that a pseudo-random generator is a good source of arbitrarily long byte sequences. The kinds of simulations and non-deterministic algorithms for which math/rand is a good choice almost never need byte sequences. Read is the only common piece of API between math/rand and crypto/rand, and code should essentially always use crypto/rand.Read instead. (math/rand.Read and crypto/rand.Read are problematic because they have the the same signature; math/rand.Int and crypto/rand.Int also both exist, but with different signatures, meaning code never accidentally mistakes one for the other.)

  2. Remove Source.Seed, Rand.Seed, and top-level Seed. Top-level Seed is deprecated as of Go 1.20. Source.Seed and Rand.Seed assume that the underlying source can be seeded by a single int64, which is only true of a limited number of sources. Specific source implementations can provide Seed methods with appropriate signatures, or none at all for generators that cannot be reseeded; the details of seeding do not belong in the general interface.

    Note that removing top-level Seed means that the top-level functions like Int will always be randomly seeded rather than deterministically seeded. math/rand/v2 will not pay attention to the randautoseed GODEBUG setting that math/rand does; auto-seeding for top-level functions is the only mode. This means in turn that the specific PRNG algorithm used by the top-level functions is unspecified and can change from release to release without breaking any existing code.

  3. Change the Source interface to have a single Uint64() uint64 method, replacing Int63() int64. The latter was overfitting to the original Mitchell & Reeds LFSR generator. Modern generators can provide uint64s, and uint64s have fewer special cases at call sites.

  4. Remove Source64, which is unnecessary now that Source provides the Uint64 method.

  5. Use a more straightforward implementation in Float32 and Float64. Taking Float64 as an example, it originally used float64(r.Int63()) / (1<<63), but this has the problem of occasionally rounding up to 1.0, which Float64 must not. We tried changing it to float64(r.Int63n(1<<53) / (1<<53), which avoids the rounding problem, but we decided it was a backwards compatibile change to break the value streams that rand.Rand generators, and instead added a retry loop for the rare 1.0 case. Now we can make the breaking change, which is simpler and faster.

    Note that some people have observed that the simple division does not make use of all the possible float64 values in the range [0, 1). For example values like 1/(1<<54), 1/(1<<55), 3/(1<<55), and so on are not generated at all, both in today's math/rand and in this simpler algorithm. Only the values 0 and 1/(1<<53) are, not the ones in between. It is possible to introduce even more complex algorithms to spread out the low values more while preserving something that can be deemed a uniform distribution, but these algorithms seem like overkill. The simple division should continue to suffice.

  6. (New since #60751.) Correct bias problems in ExpFloat64 and NormFloat64. @flyingmutant reports that these have detectable bias, perhaps due to a mistake in porting the Ziggurat algorithms from float32 to float64. We can take this opportunity to correct those.

  7. Implement Rand.Perm in terms of Rand.Shuffle. Shuffle is a bit more efficient, and then we have only one implementation. It has been suggested elsewhere to remove Rand.Perm instead, but keeping it costs little (especially if implemented in terms of Shuffle) and avoids unnecessary churn for users.

  8. Rename Intn, Int31, Int31n, Int63, Int64n to IntN, Int32, Int32N, Int64, Int64N. The 31 and 63 in the names are unnecessarily pedantic and confusing, and the capitalized N is more idiomatic for a second "word" in the name in Go. Note: The capital N's in the names are a change from #60751.

  9. Add Uint32, Uint32N, Uint64, Uint64N, Uint, UintN, both as top-level functions and methods on Rand. At the least, we need Uint64 to provide access to the widest values that the Source can provide. But we may as well complete the set while we are here. Note: The capital N's in the names are a change from #60751.

  10. (New since #60751) Add a generic top-level function N that is like Int64N or Uint64N but works for any integer type. In particular this allows using rand.N(1*time.Minute) to get a random duration in the range [0, 1*time.Minute).

  11. Use Lemire's algorithm in N, IntN, UintN, and so on. Preliminary benchmarks show a 40% savings compared to v1 Int31n and a 75% savings compared to v1 Int63n. (Like with Float64, since this changes the values returned, it is a breaking change that can only be applied in math/rand/v2.)

  12. Add a new Source implementation, PCG-DXSM, with this API:

     func NewPCG(seed1, seed2 uint64) *PCG
     type PCG struct { ... }
     func (p *PCG) Uint64() uint64
     func (p *PCG) Seed(seed1, seed2 uint64)
     func (p *PCG) MarshalBinary() ([]byte, error)
     func (p *PCG) UnmarshalBinary(data []byte) error

    PCG is a simple, efficient algorithm with good statistical randomness properties. The DXSM variant was introduced by the author specifically to correct a rare, obscure shortcoming in the original (PCG-XSLRR) and is now the default generator in Numpy.

    PCG is provided for people who are writing simulations and need a seeded, deterministic source of randomness suitable for most purposes. Note that PCG is not assumed in any code. In particular the top-level functions need not use PCG (and in the current prototype do not). If in the future we add another algorithm, it will sit alongside PCG as a true peer.

    Note: MarshalBinary and UnmarshalBinary are new since #60751.

  13. Remove the Mitchell & Reeds LFSR generator and NewSource. An alternative to removing it would be to give it a name like NewLFSR or NewMitchellReeds, but that would serve no purpose. The obvious purpose would be to provide the original math/rand source so that code that needs to reproduce the exact math/rand outputs can do so. But the other changes to Rand, in routines like Intn and Float64, change the derived values from Rand's methods for a given Source input stream. Since preserving the math/rand Source stream would not suffice to preserve the math/rand Rand stream, preserving the math/rand Source is not worthwhile. Programs that need exactly the math/rand value streams can continue to use that package; it will not be removed.

  14. (New since #60751 and the original proposal in this issue) Add a new Source implementation, ChaCha8, with this API:

     func NewChaCha8(seed [32]byte) *ChaCha8
     type ChaCha8 struct { ... }
     func (c *ChaCha8) Uint64() uint64
     func (c *ChaCha8) Seed(seed [32]byte)
     func (c *ChaCha8) MarshalBinary() ([]byte, error)
     func (c *ChaCha8) UnmarshalBinary(data []byte) error

    ChaCha8 is a cryptographically-strong random number generator derived from the ChaCha8 stream cipher. It provides security equivalent to ChaCha8 encryption.

  15. (New since #60751 and the original proposal in this issue) Use a per-OS-thread ChaCha8 for the global random generator, both in math/rand/v2 and in math/rand (when unseeded). The current implementation uses a per-OS-thread wyrand, which was fine compared to the Mitchell/Reeds/Thompson generator but is not really great compared to PCG or ChaCha8. In particular, it is strange for the top-level randomness to be less studied and less robust than the two concrete implementations provided. Using ChaCha8 also gives the default top-level randomness real cryptographic strength, turning accidental use of math/rand where crypto/rand would be more appropriate into less of a security problem. Using ChaCha8 runs slower than using wyrand, but it is about as fast as the current sync.Mutex-locked generator in math/rand (~6ns per value on my 2019 Intel MacBook Pro).

Differences since previous discussion

The differences between this proposal and what we discussed in #60751 are:

These differences are also marked above.

Discussion Summary

Summarizing the discussion on #60751:

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

zephyrtronium commented 1 year ago

Adjust the ExpFloat64 and NormFloat64 algorithms to remove bias. (Not implemented yet.)

I have an example implementation of this in https://github.com/zephyrtronium/pgregory_rand/tree/master/misc/practrand which I used to verify that it does resolve the bias detected by practrand. My approach uses recalculated 1024-step ziggurats specifically built for 64-bit PRNGs to improve the rejection procedure efficiency, which gives about -6% ns/op versus the 128-step normal ziggurat and +4% ns/op versus the 256-step exponential one (possibly due to bounds checking? I can't investigate at the moment), at the cost of the tables being a collective 48 kB instead of 4.5 kB.

goos: linux
goarch: amd64
pkg: math/rand
cpu: 12th Gen Intel(R) Core(TM) i5-1235U
               │ /home/branden/old.bench │      /home/branden/new.bench       │
               │         sec/op          │   sec/op     vs base               │
NormFloat64-12               3.961n ± 2%   3.709n ± 0%  -6.35% (p=0.000 n=20)
ExpFloat64-12                3.801n ± 1%   3.974n ± 0%  +4.56% (p=0.000 n=20)
geomean                      3.880n        3.840n       -1.04%

An alternative implementation is to use the same tables but adjusting the input to avoid reusing random bits between the step selection and the final output, which would leave tables unchanged and should result in the same performance but still resolve the bias.

This is out of band for the proposal per se, but would it be appropriate for me to send in a CL based on 502506 implementing the new ziggurats? I don't know whether or how I can do that.

rsc commented 1 year ago

@zephyrtronium If you can send a CL with the new code that'd be great. We will not be able to submit it as is, but I can lift it from there and add it to the v2 stack. 48kB of table is a lot more than 4.5kB, so your alternative leaving the tables unchanged sounds like the right approach. Thanks!

joeshaw commented 1 year ago

Creating math/rand/v2 will let us work out tooling issues (support for v2 packages in gopls, goimports, and so on)

Typically packages are identified in code by the last portion of their import path. This would be undesirable for packages versioned this way, so is the subtext here that tools like goimports would recognize math/rand/v2 and write import statements with renamed identifiers (import rand "math/rand/v2")?

rsc commented 1 year ago

@joeshaw Tools like goimports already recognize math/rand/v2 and understand that it is 'rand' even without a renamed import. Such paths are common already for the root package of a module that is >=v2.

gopherbot commented 1 year ago

Change https://go.dev/cl/516275 mentions this issue: math/rand: don't reuse bits in ziggurat variates

gopherbot commented 1 year ago

Change https://go.dev/cl/502500 mentions this issue: math/rand/v2: add, optimize N, UintN, Uint32N, Uint64N

gopherbot commented 1 year ago

Change https://go.dev/cl/502506 mentions this issue: math/rand/v2: delete Mitchell/Reeds source

gopherbot commented 1 year ago

Change https://go.dev/cl/516859 mentions this issue: math/rand/v2: add ChaCha8

gopherbot commented 1 year ago

Change https://go.dev/cl/516857 mentions this issue: math/rand/v2: rename various functions

gopherbot commented 1 year ago

Change https://go.dev/cl/502504 mentions this issue: math/rand/v2: simplify Perm

gopherbot commented 1 year ago

Change https://go.dev/cl/502496 mentions this issue: math/rand/v2: update benchmarks

gopherbot commented 1 year ago

Change https://go.dev/cl/502497 mentions this issue: math/rand/v2: remove Read

gopherbot commented 1 year ago

Change https://go.dev/cl/516858 mentions this issue: math/rand/v2: clean up regression test

gopherbot commented 1 year ago

Change https://go.dev/cl/502499 mentions this issue: math/rand/v2: change Source to use uint64

gopherbot commented 1 year ago

Change https://go.dev/cl/516860 mentions this issue: math/rand, math/rand/v2: use ChaCha8 for global rand

gopherbot commented 1 year ago

Change https://go.dev/cl/502503 mentions this issue: math/rand/v2: optimize Float32, Float64

gopherbot commented 1 year ago

Change https://go.dev/cl/502505 mentions this issue: math/rand/v2: add PCG-DXSM

gopherbot commented 1 year ago

Change https://go.dev/cl/502495 mentions this issue: math/rand/v2: start of new API

gopherbot commented 1 year ago

Change https://go.dev/cl/502498 mentions this issue: math/rand/v2: remove Rand.Seed

rsc commented 1 year ago

People seem overall happy with this proposal, since we already hashed almost everything out in the previous discussion. I've updated the CL stack with recent adjustments and updated benchmarks. Thanks to @zephyrtronium for contributing fixes for ExpFloat64 and NormFloat64, now included in the stack.

After discussion with @FiloSottile, I've added a ChaCha8 source alongside the PCG source, so that people who want cryptographically strong randomness can have it.

rsc commented 1 year ago

Also based on discussion with @FiloSottile, I've changed the implementation of the top-level random functions to use an unexported per-OS-thread ChaCha8 source. (Anyone who wants this in Source form can build a trivial Source out of rand.Uint64, as noted before.)

The top-level per-thread ChaCha8 source is slower than the per-thread wyrand source I had been using in the stack, but it's about the same speed as the top-level locked source in math/rand. We eliminate the lock but use the cost savings to pay for cryptographic randomness by default instead. This seems like a reasonable trade. In particular, wyrand is less well understood than either PCG or ChaCha8, so it seems strange to have a package that exports PCG and ChaCha8 and then uses something less robust implicitly in the top-level functions.

I've timed using each of them, and ChaCha8 is a little faster than PCG on 32-bit, a little slower on 64-bit, but overall quite similar. Both are slower than wyrand, but wyrand is fairly trivial. Given the choice between PCG and ChaCha8 for the top-level randomness, ChaCha8 has a similar cost but provides cryptographic strength. That makes accidental use of math/rand/v2 instead of crypto/rand much less of a problem.

ChaCha8 requires maintaining 304 bytes of extra state in the per-OS-thread m structure in the runtime.

[This is a separate comment so people can thumbs-up or thumbs-down the idea of using ChaCha8 as the global generator separate from the other update.]

rsc commented 1 year ago

If we do keep ChaCha8 as the global generator and commit to having some cryptographic global generator like ChaCha8 in the future, we could potentially bring back both the top-level Read function and the Rand.Read method.

In the earlier discussion, some people asked what to do about getting short byte sequences from the PRNG. We essentially sacrificed the convenience of Read for the security of forcing people over to crypto/rand. But if we make the top-level Read backed by a cryptographic generator, we could bring Read back and have both convenience and security.

Not entirely sure whether this is worth doing, but it seems worth discussing.

[This is a separate comment so people can thumbs-up or thumbs-down the idea of bringing Read back separate from the other updates.]

Merovius commented 1 year ago

@rsc

ChaCha8 requires maintaining 304 bytes of extra state in the per-OS-thread m structure in the runtime.

If I checked correctly, an m is currently 408 bytes, so this seems like a lot? Is it possible this might slow down unrelated code by the extra size?

If we do keep ChaCha8 as the global generator and commit to having some cryptographic global generator like ChaCha8 in the future, we could potentially bring back both the top-level Read function and the Rand.Read method.

ISTM that crypto/rand.Read is still more secure, is it not? AIUI it uses facilities by the kernel to get data, which I would tend to trust more than the Go runtime to do this correctly (no offense - common OS kernels just have more eyes on them).

robpike commented 1 year ago

I feel that 304 bytes is too much for the default source. Making chacha8 easy to use is great, but let's keep the default one as PCG because it is very small. 19 times smaller in fact.

rsc commented 1 year ago

The m struct tends to be noise compared to the per-thread stack allocated by the operating system. On my MacBook Pro, for example, an m's stack is 512 kB. Another 304 bytes is noise. Even on the systems where C is not involved and Go allocates the stacks, we allocate 8 kB.

@aclements points out that the state should be per-P, in which case there are even fewer: on a 16-core machine the cost would be under 5kB total for the process. (As a relative fraction compared to OS stacks it remains noise.)

zephyrtronium commented 1 year ago

My usual stance is that cryptographically secure PRNGs should be used for cryptographic applications and not general purpose ones. A rough analogy is that we don't advise people to use atomic operations when they don't need atomicity: even when it isn't measurably worse, it's a tool with a particular semantic purpose. For that reason, unless it is legitimately the case that users should choose ChaCha8 over crypto/rand for some cryptographic purposes (not at all my field of expertise, I'd appreciate clarification), I can't say I'm a fan of advertising it as cryptographically secure.

However, as I've advocated on other proposals, I believe it's important to present multiple choices of PRNG in any library. Having options indicates especially to new programmers that there are reasons to choose one PRNG over another. Here we could have advice along the lines of, "ChaCha8 is a suitable PRNG for most purposes. Applications which require many independent random sequences may prefer PCG, which has similar throughput and a much smaller state but marginally lower statistical quality over billions of variates."

Considering that advice, including ChaCha8 makes me want the Jump operation on PCG. Compared to repeated seeding which is subject to the birthday problem, jumping by 2^64 steps yields up to 2^64 PCG states that are guaranteed independent for 2^64 calls to Uint64. That's very valuable for exactly the cases where I'd choose PCG over ChaCha8. Of course, a proposal can wait until after this one is accepted, but I think it's valuable to consider it as a reason to provide both generators.

robpike commented 1 year ago

@rsc Perhaps we are talking about different things. One of my original points regarding the need to replace the generator was the possibility of a lock-free simulation by putting a generator in every object. There may be many thousands, or even much more than that, such items in a large simulation. The size of the generator matters then.

Please clarify what the status of PCG would be in your proposal. Is it just another generator, is it the default when you ask for a new source (which is what I would prefer), or does it just not exist at all?

Merovius commented 1 year ago

@robpike My understanding is that ChaCha8 would be the "global source", as in a lock-free per-m (or per-p) source used when calling the package-scoped functions. This is different from today, where there is one locked source used by the package-scoped functions.

Additionally, there would be two exported types PCGSource and ChaCha8Source (well, I'm not sure about the latter one. It seems of dubious use), each with their own custom seeding, suitable for your purpose.

That is, you could use a PCGSource as a field in a struct, for just 16 bytes of state, but if you just called rand.N(42), you'd use a properly seeded CSPRNG.

robpike commented 1 year ago

Maybe it's time to post the full proposed API here.

rsc commented 1 year ago

The full API is here: https://go-review.googlesource.com/c/go/+/516859/2/api/next/61716.txt

pkg math/rand/v2, func ExpFloat64() float64 #61716
pkg math/rand/v2, func Float32() float32 #61716
pkg math/rand/v2, func Float64() float64 #61716
pkg math/rand/v2, func Int() int #61716
pkg math/rand/v2, func Int32() int32 #61716
pkg math/rand/v2, func Int32N(int32) int32 #61716
pkg math/rand/v2, func Int64() int64 #61716
pkg math/rand/v2, func Int64N(int64) int64 #61716
pkg math/rand/v2, func IntN(int) int #61716
pkg math/rand/v2, func N[$0 intType]($0) $0 #61716
pkg math/rand/v2, func New(Source) *Rand #61716
pkg math/rand/v2, func NewChaCha8([32]uint8) *ChaCha8 #61716
pkg math/rand/v2, func NewPCG(uint64, uint64) *PCG #61716
pkg math/rand/v2, func NewZipf(*Rand, float64, float64, uint64) *Zipf #61716
pkg math/rand/v2, func NormFloat64() float64 #61716
pkg math/rand/v2, func Perm(int) []int #61716
pkg math/rand/v2, func Shuffle(int, func(int, int)) #61716
pkg math/rand/v2, func Uint32() uint32 #61716
pkg math/rand/v2, func Uint32N(uint32) uint32 #61716
pkg math/rand/v2, func Uint64() uint64 #61716
pkg math/rand/v2, func Uint64N(uint64) uint64 #61716
pkg math/rand/v2, func UintN(uint) uint #61716
pkg math/rand/v2, method (*ChaCha8) Seed([32]uint8) #61716
pkg math/rand/v2, method (*ChaCha8) Uint64() uint64 #61716
pkg math/rand/v2, method (*PCG) MarshalBinary() ([]uint8, error) #61716
pkg math/rand/v2, method (*PCG) Seed(uint64, uint64) #61716
pkg math/rand/v2, method (*PCG) Uint64() uint64 #61716
pkg math/rand/v2, method (*PCG) UnmarshalBinary([]uint8) error #61716
pkg math/rand/v2, method (*Rand) ExpFloat64() float64 #61716
pkg math/rand/v2, method (*Rand) Float32() float32 #61716
pkg math/rand/v2, method (*Rand) Float64() float64 #61716
pkg math/rand/v2, method (*Rand) Int() int #61716
pkg math/rand/v2, method (*Rand) Int32() int32 #61716
pkg math/rand/v2, method (*Rand) Int32N(int32) int32 #61716
pkg math/rand/v2, method (*Rand) Int64() int64 #61716
pkg math/rand/v2, method (*Rand) Int64N(int64) int64 #61716
pkg math/rand/v2, method (*Rand) IntN(int) int #61716
pkg math/rand/v2, method (*Rand) NormFloat64() float64 #61716
pkg math/rand/v2, method (*Rand) Perm(int) []int #61716
pkg math/rand/v2, method (*Rand) Shuffle(int, func(int, int)) #61716
pkg math/rand/v2, method (*Rand) Uint32() uint32 #61716
pkg math/rand/v2, method (*Rand) Uint32N(uint32) uint32 #61716
pkg math/rand/v2, method (*Rand) Uint64() uint64 #61716
pkg math/rand/v2, method (*Rand) Uint64N(uint64) uint64 #61716
pkg math/rand/v2, method (*Rand) UintN(uint) uint #61716
pkg math/rand/v2, method (*Zipf) Uint64() uint64 #61716
pkg math/rand/v2, type ChaCha8 struct #61716
pkg math/rand/v2, type PCG struct #61716
pkg math/rand/v2, type Rand struct #61716
pkg math/rand/v2, type Source interface { Uint64 } #61716
pkg math/rand/v2, type Source interface, Uint64() uint64 #61716
pkg math/rand/v2, type Zipf struct #61716
robpike commented 1 year ago

Thanks, but I wasn't clear. I meant more like a go doc output, so one can see what function attaches to which generator, and which are per-m, and so on.

ainar-g commented 1 year ago

I'm not particularly against it, but if the main concern for the removal of (non-topLevel) Rand.Read is the similarity with crypto/rand, why not rename it and also remove the io.Reader logic? I.e. something like Rand.Fill([]byte). Or, to keep the Rand API more minimal, rand.Fill(r Rand, b []byte).

mateusz834 commented 1 year ago

@rsc missing MarshalBinary/UnmarshalBinary for ChaCha8?

in https://github.com/golang/go/issues/61716#issue-1833573973 you mention them.

jnowls commented 1 year ago

While there's merits to keeping the signature of Shuffle the same as in math/rand, I feel like this is a perfect opportunity to use generics and add a generic implementation instead.

I've come across some code which incorrectly shuffles code by providing a mistakenly implemented version of Shuffle, which are both obvious programmer errors, but can often be easy to miss:

rand.Shuffle(len(s), func(i, j int) {
  s[i] = s[j]
})

rand.Shuffle(len(s), func(i, j int) {
  s[i], s[j] = s[i], s[j]
}

This is something that a generic version like (func Shuffle[T ~[]any](s T)) would address, at the cost of some few cases where Shuffle is called on a non-slice context.

For what it's worth, (and the sample size is small) there's no uses of Shuffle in the go source tree that don't provide the standard swap function that I could find.

DeedleFake commented 1 year ago

Instead of changing Shuffle(), maybe a generic swap function analogous to reflect.Swapper() could be added to slices that could simplify the usage. Something like rand.Shuffle(s, slices.Swapper[int]). Or maybe just add a rand.ShuffleSlice(). Or both. That way rand.Shuffle() doesn't have to be restricted to slices.

Edit: A quick search find a ton of usages in third-party code that all do nothing but reimplement a basic slice swap over and over and over. I think if a v2 is being done, this is definitely something to take a look at.

gophun commented 1 year ago

@jnowls Not everything that one wants to shuffle is a slice. And if it's a slice, it doesn't necessarily have to be single element-wise.

rsc commented 1 year ago

@robpke sorry for the delay - I've posted a diff in Gerrit showing the full v2 docs on the right and the v1 docs on the left for comparison.

https://go-review.googlesource.com/c/go/+/520036/1/src/math/rand/v2/doc.txt

fzipp commented 1 year ago

I've posted a diff in Gerrit showing the full v2 docs on the right and the v1 docs on the left for comparison. https://go-review.git.corp.google.com/c/go/+/520036/1/src/math/rand/v2/doc.txt

Here the link in a form that works without a Google login: https://go-review.googlesource.com/c/go/+/520036/1/src/math/rand/v2/doc.txt

rsc commented 1 year ago

Thank you @fzipp, I have fixed the URL. (For incredibly frustrating security reasons, any time people at Google visit the public URLs we get redirected to this Google-internal URL, and then we often forget to change 'git.corp.google' to 'googlesource' when pasting back.)

zephyrtronium commented 1 year ago

I'm still not clear on the motivation for adding a CSPRNG in particular to math/rand/v2. In what circumstances would I choose ChaCha8 over crypto/rand? When would I choose crypto/rand over ChaCha8?

rsc commented 11 months ago

@zephyrtronium, crypto/rand will give you unreproducible randomness. You would use rand.NewChaCha8 if you wanted repeatable randomness, such as for a simulation, but for whatever reason you believed that PCG was not high enough quality randomness. There are certainly good arguments that in certain demanding cases PCG is not random enough. On the other hand, ChaCha8 being cryptographically strong means that it is definitely random enough for any simulation. Instead of having to pick a single one to provide, math/rand/v2 can provide both under clear names and let users pick. This also future proofs the choice, in that if there is some better algorithm in the future, we can add it next to NewChaCha8 and NewPCG and there won't be one that is the "privileged" default.

Providing ChaCha8 also lets us use it for the top-level functions. This also serves as a bit of a belt-and-suspenders defense against code that should use crypto/rand but doesn't. In this case, the damage will be far more limited because ChaCha8 is itself cryptographically strong and therefore unpredictable, to an extent that PCG simply is not. Changing the global generator also affects math/rand, so even very old code that accidentally uses math/rand will be improved.

I spoke to @robpike about his concerns about ChaCha8 being what the top-level functions use, and I believe those are resolved. Rob did not realize there is no "default" source anymore, because rand.NewSource is replaced by rand.NewChaCha8 and rand.NewPCG - you have to say what you want (as I just noted in the first paragraph of this comment).

rsc commented 11 months ago

Have all remaining concerns about this proposal been addressed?

mateusz834 commented 11 months ago

@rsc https://github.com/golang/go/issues/61716#issuecomment-1672631554

mateusz834 commented 11 months ago

How the default ChaCha8 source (used by top-level functions) will be populated? From the CL 516860 is seems like is uses getRandomData function, which reads the /dev/urandom file, but in case of an error it seems to use nanotime() to set the seed. Is this fine??

rsc commented 11 months ago

@mateusz834 Thanks I will get the Marshal methods added. Consider them part of the proposal.

How the default ChaCha8 source (used by top-level functions) will be populated? From the CL 516860 is seems like is uses getRandomData function, which reads the /dev/urandom file, but in case of an error it seems to use nanotime() to set the seed. Is this fine??

We will look into that. These reads no longer fail on modern systems so it's not much of an issue.

mateusz834 commented 11 months ago

@rsc

These reads no longer fail on modern systems so it's not much of an issue.

Yeah, but there is still a possibility that the ChaCha8 might not be initialized with cryptographically secure random data (unlikely), at least that should be documented in the package docs. Also it would be probably better to use the getRandom syscall (as crypto/rand does) instead of reading /dev/urandom.

jnowls commented 11 months ago

@rsc I would love to see my comment get addressed, I feel that if we are recreating the api for math/rand, a second look over shuffling over slices should be considered, even if a different version or nothing comes out of it :)

rsc commented 11 months ago

Re https://github.com/golang/go/issues/61716#issuecomment-1676771440, we can't use generics on *rand.Rand, and the top-level Shuffle should match the method on *rand.Rand. We could consider adding some other-named helper, like ShuffleSlice, as a separate proposal. Honestly it doesn't seem like it comes up enough.

rsc commented 11 months ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group