SkyAPM / go2sky

Distributed tracing and monitor SDK in Go for Apache SkyWalking APM
https://skywalking.apache.org/
Apache License 2.0
448 stars 122 forks source link

It has small probability will be panic #140

Closed Luckyboys closed 2 years ago

Luckyboys commented 2 years ago

Describe the bug it has small probability will be panic

To Reproduce Steps to reproduce the behavior: call CreateExitSpan in http transport the sample rate is 1%

Expected behavior no panic

Screenshots

panic: runtime error: index out of range [-1]
goroutine 67790330 [running]:
math/rand.(*rngSource).Uint64(...)
    /usr/local/go/src/math/rand/rng.go:249
math/rand.(*rngSource).Int63(0xc0004c3500, 0x13c6780)
    /usr/local/go/src/math/rand/rng.go:234 +0x98
math/rand.(*Rand).Int63(...)
    /usr/local/go/src/math/rand/rand.go:85
math/rand.(*Rand).Int31(...)
    /usr/local/go/src/math/rand/rand.go:99
math/rand.(*Rand).Int31n(0xc0006aa5d0, 0x5e3a2b6400000064, 0x7f1486ac3a68)
    /usr/local/go/src/math/rand/rand.go:134 +0x5f
math/rand.(*Rand).Intn(0xc0006aa5d0, 0x64, 0xc00003e090)
    /usr/local/go/src/math/rand/rand.go:172 +0x45
github.com/SkyAPM/go2sky.(*RandomSampler).IsSampled(0xc00035efd8, 0xc00012e7c8, 0x18, 0x0)
    /builds/XseneFNy/0/cube/server/log-agent/builds/go/pkg/mod/github.com/!sky!a!p!m/go2sky@v1.2.1-0.20211117143848-ff3fce803a4a/sampler.go:55 +0x38
github.com/SkyAPM/go2sky.(*DynamicSampler).IsSampled(0xc0004ba9a0, 0xc00012e7c8, 0x18, 0x0)
    /builds/XseneFNy/0/cube/server/log-agent/builds/go/pkg/mod/github.com/!sky!a!p!m/go2sky@v1.2.1-0.20211117143848-ff3fce803a4a/sampler.go:79 +0x48
github.com/SkyAPM/go2sky.(*Tracer).CreateLocalSpan(0xc0004ab500, 0x1654c70, 0xc000384078, 0xc000628dc0, 0x2, 0x2, 0x0, 0x7f1460004f00, 0xf00, 0x48a165, ...)
    /builds/XseneFNy/0/cube/server/log-agent/builds/go/pkg/mod/github.com/!sky!a!p!m/go2sky@v1.2.1-0.20211117143848-ff3fce803a4a/trace.go:123 +0x387
github.com/SkyAPM/go2sky.(*Tracer).CreateExitSpanWithContext(0xc0004ab500, 0x1654c70, 0xc000384078, 0xc00012e7c8, 0x18, 0xc0002aec87, 0x37, 0xc000628f68, 0x0, 0xc000628f00, ...)
    /builds/XseneFNy/0/cube/server/log-agent/builds/go/pkg/mod/github.com/!sky!a!p!m/go2sky@v1.2.1-0.20211117143848-ff3fce803a4a/trace.go:152 +0x197
github.com/SkyAPM/go2sky.(*Tracer).CreateExitSpan(...)
    /builds/XseneFNy/0/cube/server/log-agent/builds/go/pkg/mod/github.com/!sky!a!p!m/go2sky@v1.2.1-0.20211117143848-ff3fce803a4a/trace.go:139

Desktop (please complete the following information):

Additional context we found this problem at a ran 22 days process. It was happened 3 times up to now. I don't know if related to concurrent requests, that's a guess.

Luckyboys commented 2 years ago

in go issue was said rand.Rand need use in only one goroutine. using a mutex is the right solution for multiple goroutine. https://github.com/golang/go/issues/3611

so, either we make the RandomSampler has a mutex to control the write operation only one for a goroutine, or let the rand.Rand has multiple copies in every goroutine (it like a random computer pool).

wu-sheng commented 2 years ago

I am not sure in Go, but at least in Java, we keep one random seed per thread. A lock is not acceptable, because sampling is for better performance, lock is on the opposite way.

Luckyboys commented 2 years ago

In this https://github.com/SkyAPM/go2sky/blame/master/sampler.go.(RandomSampler) implement, we only create one RandomSampler and set it to the tracer.

no matter how we created the tracer, we using the sampler entity is only one. so, if multiple goroutine to use it to call IsSampled, the IsSampled implement

func (s *RandomSampler) IsSampled(operation string) bool {
    return s.threshold > s.rand.Intn(100)
}

it has propability called s.rand.Intn(100) in multiple goroutine.

if the mutex solution is not acceptable, I suggest use a only one goroutine to create the ramdom number and push to a channel, other read goroutine is pull the random number from the channel. that's will be reduces performance loss due to locking.

hanahmily commented 2 years ago

I suggest use a only one goroutine to create the ramdom number and push to a channel, other read goroutine is pull the random number from the channel. that's will be reduces performance loss due to locking.

How about using a new rand in every goroutine? it's very neat, might be faster than mutex. If we have concerns about the overhead of the object allocation, the object pool should improve it.

Anyway, a benchmark is always helpful to find the ideal solution.

Luckyboys commented 2 years ago

How about using a new rand in every goroutine? it's very neat, might be faster than mutex.

a question is how to known current goroutine was created the rand.Rand entity. As I knew, goroutine is memory shared. It doesn't like working mechanism of thread. so, we can't check the variable was created in this goroutine.

And I was benchmarked current RandomSampler, used mutex solution, and used channel to store generate random number solution.

Go Benchmark Results

CPU: Apple silicon M1 Pro 10 cores
Memory: 16GB
goos: darwin
goarch: arm64
pkg: github.com/SkyAPM/go2sky
BenchmarkRandomSampler_IsSampled-10                 195834091            5.965 ns/op
BenchmarkRandomMutexSampler_IsSampled-10             8900761           144.6 ns/op
BenchmarkRandomGlobalSampler_IsSampled-10            8900469           146.2 ns/op
BenchmarkRandomChannelSampler0_IsSampled-10          2612180           442.4 ns/op
BenchmarkRandomChannelSampler1_IsSampled-10          2770339           386.1 ns/op
BenchmarkRandomChannelSampler128_IsSampled-10        4160923           296.5 ns/op
BenchmarkRandomChannelSampler1024_IsSampled-10       4435966           293.8 ns/op
BenchmarkRandomPoolSampler_IsSampled-10             190921699            6.359 ns/op
PASS
ok      github.com/SkyAPM/go2sky    16.800s

The conclusion is non mutex is very fast, but the rand.Rand is not concurrency safe. so it can't use in production. However, if we need concurrency safe, it's at least lock once when use the rand.Rand to generate a random number.

for example, the entity pool solution is need a lock to get and put the rand.Rand entity. or the go channel solution, because go channel it using a lock in internal implement. but if use sync.pool to implement pool, the lock is use internal lock, it will try to use optimistic lock in internal, so the performance will great better than to use mutex to impletement it.

Sampler Implement Codes

// RandomMutexSampler Use mutex to implement concurrent-safe for randomizer.
type RandomMutexSampler struct {
    mutex        sync.Mutex
    samplingRate float64
    rand         *rand.Rand
    threshold    int
}

// IsSampled implements IsSampled() of Sampler.
func (s *RandomMutexSampler) IsSampled(operation string) bool {
    s.mutex.Lock()
    defer s.mutex.Unlock()
    return s.threshold > s.rand.Intn(100)
}

func (s *RandomMutexSampler) init() {
    s.rand = rand.New(rand.NewSource(time.Now().Unix()))
    s.threshold = int(s.samplingRate * 100)
}

func NewRandomMutexSampler(samplingRate float64) *RandomMutexSampler {
    s := &RandomMutexSampler{
        samplingRate: samplingRate,
    }
    s.init()
    return s
}

// RandomChannelSampler Use channel to implement concurrent-safe for randomizer.
type RandomChannelSampler struct {
    samplingRate        float64
    randomNumberChannel chan int
    rand                *rand.Rand
    threshold           int
}

// IsSampled implements IsSampled() of Sampler.
func (s *RandomChannelSampler) IsSampled(operation string) bool {
    n := <-s.randomNumberChannel
    return s.threshold > n
}

func (s *RandomChannelSampler) init() {
    s.rand = rand.New(rand.NewSource(time.Now().Unix()))
    s.threshold = int(s.samplingRate * 100)
    go func() {
        for {
            s.randomNumberChannel <- s.rand.Intn(100)
        }
    }()
}

func NewRandomChannelSampler(samplingRate float64, channelSize int) *RandomChannelSampler {
    s := &RandomChannelSampler{
        samplingRate:        samplingRate,
        randomNumberChannel: make(chan int, channelSize),
    }
    s.init()
    return s
}

// RandomGlobalSampler Use global randomizer to ensure concurrent-safe, it's used a local lock too.
type RandomGlobalSampler struct {
    samplingRate float64
    threshold    int
}

// IsSampled implements IsSampled() of Sampler.
func (s *RandomGlobalSampler) IsSampled(operation string) bool {
    return s.threshold > rand.Intn(100)
}

func (s *RandomGlobalSampler) init() {
    s.threshold = int(s.samplingRate * 100)
}

func NewRandomGlobalSampler(samplingRate float64) *RandomGlobalSampler {
    s := &RandomGlobalSampler{
        samplingRate: samplingRate,
    }
    s.init()
    return s
}

// RandomPoolSampler Use sync.Pool to implement concurrent-safe for randomizer.
type RandomPoolSampler struct {
    samplingRate float64
    threshold    int
    initCount    int
    pool         sync.Pool
}

// IsSampled implements IsSampled() of Sampler.
func (s *RandomPoolSampler) IsSampled(operation string) bool {

    return s.threshold > s.generateRandomNumber()
}

func (s *RandomPoolSampler) init() {
    s.threshold = int(s.samplingRate * 100)
    for i := 0; i < s.initCount; i++ {
        s.pool.Put(s.newRand())
    }
    s.pool.New = s.newRand
}

func (s *RandomPoolSampler) generateRandomNumber() int {

    r := s.getRandomizer()

    randomNumber := r.Intn(100)

    s.returnRandomizer(r)

    return randomNumber
}

func (s *RandomPoolSampler) returnRandomizer(r *rand.Rand) {
    s.pool.Put(r)
}

func (s *RandomPoolSampler) getRandomizer() *rand.Rand {

    var r *rand.Rand

    generator := s.pool.Get()
    if generator == nil {
        generator = s.newRand()
    }

    r, ok := generator.(*rand.Rand)
    if !ok {
        r = s.newRand().(*rand.Rand) // it must be *rand.Rand
    }

    return r
}

func (s *RandomPoolSampler) newRand() interface{} {
    return rand.New(rand.NewSource(time.Now().UnixNano()))
}

func NewRandomPoolSampler(samplingRate float64, initCount int) *RandomPoolSampler {
    s := &RandomPoolSampler{
        samplingRate: samplingRate,
        initCount:    initCount,
    }
    s.init()
    return s
}

Go Test Codes

func BenchmarkRandomSampler_IsSampled(b *testing.B) {
    sampler := NewRandomSampler(0.5)
    operationName := "op"
    // will be panic
    /*b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            sampler.IsSampled(operationName)
        }
    })*/
    // Only run serial
    for i := 0; i < b.N; i++ {
        sampler.IsSampled(operationName)
    }
}

func BenchmarkRandomMutexSampler_IsSampled(b *testing.B) {
    sampler := NewRandomMutexSampler(0.5)
    operationName := "op"
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            sampler.IsSampled(operationName)
        }
    })
}

func BenchmarkRandomGlobalSampler_IsSampled(b *testing.B) {
    sampler := NewRandomGlobalSampler(0.5)
    operationName := "op"
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            sampler.IsSampled(operationName)
        }
    })
}

func BenchmarkRandomChannelSampler0_IsSampled(b *testing.B) {
    sampler := NewRandomChannelSampler(0.5, 0)
    operationName := "op"
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            sampler.IsSampled(operationName)
        }
    })
}

func BenchmarkRandomChannelSampler1_IsSampled(b *testing.B) {
    sampler := NewRandomChannelSampler(0.5, 1)
    operationName := "op"
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            sampler.IsSampled(operationName)
        }
    })
}

func BenchmarkRandomChannelSampler128_IsSampled(b *testing.B) {
    sampler := NewRandomChannelSampler(0.5, 128)
    operationName := "op"
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            sampler.IsSampled(operationName)
        }
    })
}

func BenchmarkRandomChannelSampler1024_IsSampled(b *testing.B) {
    sampler := NewRandomChannelSampler(0.5, 1024)
    operationName := "op"
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            sampler.IsSampled(operationName)
        }
    })
}

func BenchmarkRandomPoolSampler_IsSampled(b *testing.B) {
    sampler := NewRandomPoolSampler(0.5, runtime.NumCPU())
    operationName := "op"
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            sampler.IsSampled(operationName)
        }
    })
}