Instead of using the first response that comes along as a base for our
reply and merging other responses into it, we setup an empty response
that we merge other ones into. This fixes the race condition unveiled in
commit a53ec78, as well as a host of race conditions we encounted in
production, starting with [1] below.
I'm not sure what the problem here was. At first I thought that merging
the responses inside the select statement was the source of our
problems, so I moved the merging out of it. That got rid of the data
race in [1], but introduced new ones in
where the slice in first.Response.Metrics was being modified somewhere
else while we were looping through it. I was not able to find where this
was actually happening after staring at stack traces program internals
with delve, so we'll content ourselves with having whacked this mole
somehow, mysteriously.
[1] Production data race stacktrace:
==================
WARNING: DATA RACE
Read at 0x00c42012c380 by goroutine 946:
github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/types.(*ServerFetchResponse).Merge()
/home/redacted/go/src/github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/types/stats.go:23 +0xc2b
github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).Fetch()
/home/redacted/go/src/github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:234 +0xcd2
github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).doSingleFetch()
/home/redacted/go/src/github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:179 +0x77f
Previous write at 0x00c42012c380 by goroutine 1720:
github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/types.(*ServerFetchResponse).Merge()
/home/redacted/go/src/github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/types/stats.go:23 +0xc1d
github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).Fetch()
/home/redacted/go/src/github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:234 +0xcd2
github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).doSingleFetch()
/home/redacted/go/src/github.com/go-graphite/carbonapi/vendor/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:179 +0x77f
[...]
Instead of using the first response that comes along as a base for our reply and merging other responses into it, we setup an empty response that we merge other ones into. This fixes the race condition unveiled in commit a53ec78, as well as a host of race conditions we encounted in production, starting with [1] below.
I'm not sure what the problem here was. At first I thought that merging the responses inside the select statement was the source of our problems, so I moved the merging out of it. That got rid of the data race in [1], but introduced new ones in
where the slice in first.Response.Metrics was being modified somewhere else while we were looping through it. I was not able to find where this was actually happening after staring at stack traces program internals with delve, so we'll content ourselves with having whacked this mole somehow, mysteriously.
[1] Production data race stacktrace: