Go random number generation is single threaded

dgryski commented 9 years ago

Similar to the java issue #1152 , all calls to rand.Intn() are behind a single murex in go to prevent concurrent access to the RNG. These calls should be updated to use an xorshift RNG seeded with the current time in nanoseconds.

https://github.com/TechEmpower/FrameworkBenchmarks/search?q=rand.Intn&type=Code&utf8=✓

zane-techempower commented 9 years ago

I took a look into this and made a benchmark with a golang xorshift library I found on github: https://github.com/lazybeaver/xorshift#description

Here is the benchmark I made: http://pastebin.com/wEMKSqPf Note that benchmark includes getting the random numbers into the 1-10000 range as per test requirements.

To run: 1) Go must be installed 2) put in file with _test ending e.g. rand_test.go, preferably in its own directory. 3) $ go test -bench=. in directory, . is regex for the test files to be run (so all in this dir).

Findings (with fluff edited out):

    $ go test -bench=.

    BenchmarkNewRandIntn            50000000            36.8 ns/op
    BenchmarkNewXorShift64Star      200000000            6.61 ns/op
    BenchmarkNewXorShift128Plus     300000000            4.47 ns/op
    BenchmarkNewXorShift1024Star    200000000            7.61 ns/op

    $ go test -bench=.

    BenchmarkNewRandIntn            50000000            36.9 ns/op
    BenchmarkNewXorShift64Star      200000000            6.59 ns/op
    BenchmarkNewXorShift128Plus     300000000            4.44 ns/op
    BenchmarkNewXorShift1024Star    200000000            7.61 ns/op

    $ go test -bench=.

    BenchmarkNewRandIntn            30000000            36.7 ns/op
    BenchmarkNewXorShift64Star      200000000            6.60 ns/op
    BenchmarkNewXorShift128Plus     300000000            4.46 ns/op
    BenchmarkNewXorShift1024Star    200000000            7.63 ns/op

Looks to me like you are right about the slowdown on rand.Intn. Of the three xorshift variants XorShift128plus is the fastest algorithm. This makes sense as additions are faster than multiplication, which is what the linear functions that are applied after the xorshift step do. "plus/+" and "star/*" in the shift name denotes this.

I can go ahead and update the tests after some peeps have had a chance to look at this.

dgryski commented 9 years ago

I wrote https://github.com/dgryski/trifles/blob/master/fastrand/fastrand.go that includes Intn to avoid modulo bias on bounded ranges.

The slowdown will be larger under concurrent code due to lock contention, so it's not just the speed of the RNG.

methane commented 9 years ago

Did you measured? JSON test which doesn't uses Rand also slow somehow. I think this is not bottleneck of the benchmark.

dgryski commented 9 years ago

I have not benchmarked this particular case, but have seen numerous times in the past where access to a single-threaded rand instance has caused performance degradation due to lock contention. I'm fine to ignore this issue until a blocking profile shows otherwise.

methane commented 9 years ago

RNG is used only in Fortune test. Fortune test includes HTTP server, JSON encode and database query. Database query also has lock. I don't think lock for RNG can be bottleneck.

dgryski commented 9 years ago

Looking at https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Go/go/src/hello/hello.go it seems rand.Intn is called in almost all the handlers to select random rows from the database.

methane commented 9 years ago

I'm sorry, I was wrong. But plaintext and json doesn't use rand. All other tests uses db as many as rand. rand can be bottleneck if rand is slower than db.

methane commented 9 years ago

Locks in db.Query() (fastest case): https://github.com/golang/go/blob/master/src/database/sql/sql.go#L666-L682 https://github.com/golang/go/blob/master/src/database/sql/sql.go#L750-L782

src.Int63() (called while lock held): https://github.com/golang/go/blob/master/src/math/rand/rng.go#L232-L246

methane commented 9 years ago

Additionally, there is go-prefork already. I don't want to add complexity or dependency without significant improvement.

dgryski commented 9 years ago

As I said, I'm fine to ignore this until it shows up in a profile.

dgryski commented 7 years ago

It might be interesting to profile with the new lock-contention profile in 1.8: https://rakyll.org/mutexprofile/

TechEmpower / FrameworkBenchmarks

Go random number generation is single threaded #1469