In support of -race, Go runtime implements a fairly inexpensive, random shuffling [1], [2], [3] in the (goroutine) scheduler,
when runqput is invoked, the goroutine is added to the tail of the local (runnable) queue, with probability of 0.5
when runqputslow is invoked, the local (runnable) queue is reshuffled
In CL for [1], Dmitry's comment [4] proposes a non-local perturbation by shuffling the global (runnable) queue. He argues that the implemented change is fairly deterministic, yet, his idea didn't get implemented afaict,
Yes, it is quite deterministic. And if a thread is scheduled most likely it won't be preempted in the next 20ms at least. And also if two threads start at the same time, but one runs 10ms till the reordering point and the other runs for 1000ms till the reordering point, the chances that these points will almost always executed in the same order. That's why aggressive reordering is necessary.
Since running end-to-end tests (i.e., roachtest) with -race might not be feasible (owing to the perf. overhead), a lightweight (global) randomization inside the scheduler could facilitate exposing (data) race bugs in roachtest, without the overhead of TSan.
We already have the infrastructure to create custom builds [5], [6]. Thus, all that's needed is a patch to be applied in CI; roachtests can then be staged with the custom binary to enable runtime fuzzing.
In support of
-race
, Go runtime implements a fairly inexpensive, random shuffling [1], [2], [3] in the (goroutine) scheduler,runqput
is invoked, the goroutine is added to the tail of the local (runnable) queue, with probability of 0.5runqputslow
is invoked, the local (runnable) queue is reshuffledIn CL for [1], Dmitry's comment [4] proposes a non-local perturbation by shuffling the global (runnable) queue. He argues that the implemented change is fairly deterministic, yet, his idea didn't get implemented afaict,
Since running end-to-end tests (i.e., roachtest) with
-race
might not be feasible (owing to the perf. overhead), a lightweight (global) randomization inside the scheduler could facilitate exposing (data) race bugs in roachtest, without the overhead of TSan.We already have the infrastructure to create custom builds [5], [6]. Thus, all that's needed is a patch to be applied in CI; roachtests can then be staged with the custom binary to enable runtime fuzzing.
[1] https://github.com/golang/go/issues/11372 [2] https://github.com/golang/go/blob/cd6d225bd30608544ecf4a3e5a7aa1d0607a66db/src/runtime/proc.go#L5953 [3] https://github.com/golang/go/blob/cd6d225bd30608544ecf4a3e5a7aa1d0607a66db/src/runtime/proc.go#L6005 [4] https://go-review.googlesource.com/c/go/+/11795/3#message-8a36b86dfffd37e7fe1a09d51f66d74985236697 [5] https://github.com/cockroachdb/cockroach/blob/76da6c7dfb4d71ad614715ec5e230a6cc76fbd0e/build/teamcity/internal/release/build-and-publish-patched-go/impl.sh#L60-L62 [6] https://github.com/cockroachdb/cockroach/blob/76da6c7dfb4d71ad614715ec5e230a6cc76fbd0e/build/teamcity/internal/release/build-and-publish-patched-go/impl-fips.sh#L6-L8
Jira issue: CRDB-28903