fleaz / CptHook

Receive webhooks from different applications and post them to IRC channels
MIT License
17 stars 8 forks source link

SIGSEGV after failed reconnects #36

Closed andir closed 5 years ago

andir commented 5 years ago

This is probably a follow-up to #6. After a few failed reconnects the program crahses with a segmentation fault…

Dec 01 17:15:05 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-01T17:15:05Z" level=info msg="Setting loglevel defined by configuration" fields.level=info
Dec 01 17:15:05 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-01T17:15:05Z" level=info msg="Loaded module \"prometheus\""
Dec 01 17:15:05 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-01T17:15:05Z" level=info msg="Started HTTP Server" listen="[::1]:8083"
Dec 01 17:15:05 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-01T17:15:05Z" level=info msg="ChannelReceiver started"
Dec 10 16:22:05 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-10T16:22:05Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: timed out waiting for a requested PING response"
Dec 10 16:22:05 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-10T16:22:05Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:12:35 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:12:35Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: timed out waiting for a requested PING response"
Dec 14 03:12:35 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:12:35Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:13:10 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:13:10Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: dial tcp: i/o timeout"
Dec 14 03:13:10 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:13:10Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:13:45 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:13:45Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: dial tcp: i/o timeout"
Dec 14 03:13:45 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:13:45Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:14:17 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:14:17Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: dial tcp: lookup irc.hackint.eu: device or resource busy"
Dec 14 03:14:17 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:14:17Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:14:49 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:14:49Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: dial tcp: lookup irc.hackint.eu: device or resource busy"
Dec 14 03:14:49 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:14:49Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:15:21 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:15:21Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: dial tcp: lookup irc.hackint.eu: device or resource busy"
Dec 14 03:15:21 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:15:21Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:15:53 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:15:53Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: dial tcp: lookup irc.hackint.eu: device or resource busy"
Dec 14 03:15:53 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:15:53Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:16:25 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:16:25Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: dial tcp: lookup irc.hackint.eu: device or resource busy"
Dec 14 03:16:25 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:16:25Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:16:57 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:16:57Z" level=warning msg="Connection to irc.hackint.eu:6697 terminated: dial tcp: lookup irc.hackint.eu: device or resource busy"
Dec 14 03:16:57 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: time="2018-12-14T03:16:57Z" level=warning msg="Reconnecting to in 30 seconds..."
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: panic: runtime error: invalid memory address or nil pointer dereference
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6576bf]
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: goroutine 9 [running]:
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: github.com/lrstanley/girc.(*ircConn).rate(0x0, 0xd, 0x0)
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]:         /build/go/src/github.com/lrstanley/girc/conn.go:411 +0x2f
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: github.com/lrstanley/girc.(*Client).Send(0xc0000cc840, 0xc0001220e0)
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]:         /build/go/src/github.com/lrstanley/girc/conn.go:389 +0x14d
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: github.com/lrstanley/girc.(*Commands).Join(0xc00000e280, 0xc000322030, 0x1, 0x1)
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]:         /build/go/src/github.com/lrstanley/girc/commands.go:46 +0x1bc
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: main.joinChannel(0xc000022930, 0x8)
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]:         /build/go/src/github.com/fleaz/CptHook/irc.go:124 +0x29a
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: main.channelReceiver()
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]:         /build/go/src/github.com/fleaz/CptHook/irc.go:106 +0x169
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]: created by main.ircConnection
Dec 14 03:17:07 prometheus bsznfglaha9va05jzpbywy7h5x6ayg1g-unit-script-cpthook-start[7257]:         /build/go/src/github.com/fleaz/CptHook/irc.go:89 +0x2f1

All the reconnects before 2018-12-14 were successful (at some point). The bot timed out a few minutes before the program crashed :/.

fleaz commented 5 years ago

Thanks for reporting this.

The crash apparently happened in irc.go:124 where CptHook tries to join a new Channel. This happens in a different thread, while the first thread is still trying to reconnect to the IRC network. Which is the reason for the segmentation violation because there is no client object.

The girc library doesn't appear to be thread safe, because we already had the same problem when CptHook tries to send a message during the reconnect (See https://github.com/lrstanley/girc/issues/14 )

My idea would be to implement some kind of locking myself to prevent CptHook from using the irc client object durring reconnect

fleaz commented 5 years ago

This is fixed in v0.3.0