Closed rschmukler closed 8 years ago
We did find a data race this week in this hook:
Read by goroutine 74:
sync.raceRead()
/private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/sync/race.go:37 +0x2e
sync.(*WaitGroup).Add()
/private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/sync/waitgroup.go:66 +0xfa
github.com/gemnasium/logrus-graylog-hook.(*GraylogHook).Fire()
/Users/talbright/Code/go/src/github.com/gemnasium/logrus-graylog-hook/graylog_hook.go:82 +0xa1
github.com/Sirupsen/logrus.LevelHooks.Fire()
/Users/talbright/Code/go/src/github.com/Sirupsen/logrus/hooks.go:28 +0x144
github.com/Sirupsen/logrus.Entry.log()
/Users/talbright/Code/go/src/github.com/Sirupsen/logrus/entry.go:88 +0x19c
github.com/Sirupsen/logrus.(*Entry).Debug()
/Users/talbright/Code/go/src/github.com/Sirupsen/logrus/entry.go:119 +0xe2
github.com/assistly/birdhouse/dist.(*Dist).loop()
/Users/talbright/Code/go/src/github.com/assistly/birdhouse/dist/dist.go:103 +0x17a
Previous write by goroutine 68:
sync.raceWrite()
/private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/sync/race.go:41 +0x2e
sync.(*WaitGroup).Wait()
/private/var/folders/bn/49l0ffts381_wqsh02663nr40000gn/T/workdir/go/src/sync/waitgroup.go:124 +0xf9
github.com/gemnasium/logrus-graylog-hook.(*GraylogHook).Flush()
/Users/talbright/Code/go/src/github.com/gemnasium/logrus-graylog-hook/graylog_hook.go:91 +0x3a
github.com/gemnasium/logrus-graylog-hook.(*GraylogHook).Fire()
/Users/talbright/Code/go/src/github.com/gemnasium/logrus-graylog-hook/graylog_hook.go:84 +0x112
github.com/Sirupsen/logrus.LevelHooks.Fire()
/Users/talbright/Code/go/src/github.com/Sirupsen/logrus/hooks.go:28 +0x144
github.com/Sirupsen/logrus.Entry.log()
/Users/talbright/Code/go/src/github.com/Sirupsen/logrus/entry.go:88 +0x19c
github.com/Sirupsen/logrus.(*Entry).Info()
/Users/talbright/Code/go/src/github.com/Sirupsen/logrus/entry.go:129 +0xe2
github.com/Sirupsen/logrus.(*Entry).Infof()
/Users/talbright/Code/go/src/github.com/Sirupsen/logrus/entry.go:173 +0x197
github.com/Sirupsen/logrus.(*Entry).Printf()
/Users/talbright/Code/go/src/github.com/Sirupsen/logrus/entry.go:178 +0x60
github.com/talbright/go-zookeeper/zk.(*Conn).loop()
/Users/talbright/Code/go/src/github.com/talbright/go-zookeeper/zk/conn.go:315 +0x933
github.com/talbright/go-zookeeper/zk.Connect.func1()
/Users/talbright/Code/go/src/github.com/talbright/go-zookeeper/zk/conn.go:205 +0x32
After applying this patch the race conditions cleared up.
Great work, thanks!
This PR fixes a few issues that we ran into while running this in a few of our apps...
1) ad4e32b - as written the Async logger actually behaved the same as the Sync logger and would always wait for the entry to be sent before returning
2) 6638df5 - a much more interesting problem. If two go routines called Log at the same time, one of them could increment the
wg.Add
counter while the other one had calledFlush
(wg.Wait
). The Gosync.WaitGroup
has a check for when this happens (See waitgroup.go#L129) and panics.I added a test that covers this (although most CI only give a single core so the test might never fail on cI anyway) and changed the implementation such that the hook will send the entry directly when sync and otherwise it will use the buffer. Additionally, a
sync.Mutex
is added to prevent additional logging when theAsyncHook
is being flushed3) c0f23b0 just switched to a RWMutex allowing multiple go routines to Write simultaneously, while
Flush
still blocks all of them from writing. This is a bit awkward in the naming (since we allow multiple writes and one Read, as opposed to multiple Reads and one Write...).