Closed gunnihinn closed 6 years ago
Cache key is intentionally the same, because otherwise there is no point in caching :)
I think there might be another broken thing: https://github.com/go-graphite/carbonzipper/blob/master/zipper/cache/query.go#L37-L44
I've used "nil, false" before to detect that "We need to fetch data!". But after refactor we are also returning "nil, false" in case of "ctx.Done()" - and that's actually wrong - it should be "nil, true" and check should be also modified here: https://github.com/go-graphite/carbonzipper/blob/master/zipper/broadcast/broadcast_group.go#L199
I've wrote a test that can reproduce that crash 20% of times (other 80% it just locks up).
I just cancel request and issue dozen of fetches in parallel. Will commit test later today (even if won't be able to figure out what's the cause of lock up)
==================
WARNING: DATA RACE
Write at 0x00c420466368 by goroutine 11:
github.com/go-graphite/carbonzipper/zipper/cache.(*QueryItem).StoreAbort()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/cache/query.go:58 +0xa6
github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).Find()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:386 +0x1177
github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).doSingleFetch()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:141 +0xcf4
Previous read at 0x00c420466368 by goroutine 14:
github.com/go-graphite/carbonzipper/zipper/cache.(*QueryItem).FetchOrLock()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/cache/query.go:45 +0xdb
github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).Find()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:316 +0x461
github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).doSingleFetch()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:141 +0xcf4
Goroutine 11 (running) created at:
github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).Fetch()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:214 +0x95d
github.com/go-graphite/carbonzipper/zipper/broadcast.TestFetchRequestsWithTimeout.func1.1()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group_test.go:1302 +0xc0
Goroutine 14 (running) created at:
github.com/go-graphite/carbonzipper/zipper/broadcast.(*BroadcastGroup).Fetch()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group.go:214 +0x95d
github.com/go-graphite/carbonzipper/zipper/broadcast.TestFetchRequestsWithTimeout.func1.1()
/home/civil/go/src/github.com/go-graphite/carbonzipper/zipper/broadcast/broadcast_group_test.go:1302 +0xc0
==================
https://github.com/go-graphite/carbonzipper/commit/cfa47cd8a8de0ec23a29c5500a153a524b80d5db
But that might be not related to this issue.
On my system current test always panics with "panic: close of closed channel" (required approx. 10 runs to actually panic)
Feel free to reopen if problem appears again.
Run current HEAD, make it talk to any store, and make concurrent requests to any single metric with cache disabled:
This will crash the carbonzipper process with a close of a closed channel; see [1].
The stack trace points to a manual call of a method that had been defer-called above, and which does close a channel. This is a red herring, as the issue is still reproducible after removing the manual call.
The real issue could be that the QueryItem cache keys are constructed just from the request data, so two requests for the same target that come in close enough to each other may be able to double-write a
*QueryItem
to the same cache key before one of them is locked. Once the cache key gets filled in, two or more goroutines then close the same channel.Running with cache disabled only serves to exhibit the issue more quickly; we see this in production at Booking.com with the cache enabled.
[1] Stack trace: