42wim / matterircd

Connect to your mattermost or slack using your IRC-client of choice.
MIT License
294 stars 61 forks source link

100% cpu usage #223

Closed flexd closed 5 years ago

flexd commented 5 years ago

Hi there.

I started matterircd using the Docker container, and it works fine sudo docker run -d --name matterircd -p 6667:6667 42wim/matterircd:latest -bind 0.0.0.0:6667 but I just noticed that matterircd is pinned to 100% cpu, which is a problem since this little droplet only has 1 core and also seems completely pointless.

Doing a strace of the process reveals it looks to be polling something tens of thousands of times per second. This is a small excerpt:

[pid  8645] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid  8755] epoll_pwait(4,  <unfinished ...>
[pid  8645] <... nanosleep resumed> NULL) = 0
[pid  8645] futex(0x56527f09ef90, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
[pid  8755] <... epoll_pwait resumed> [], 128, 0, NULL, 33555305579222796) = 0
[pid  8755] futex(0x56527f09ef90, FUTEX_WAKE_PRIVATE, 1) = 1
[pid  8645] <... futex resumed> )       = 0
[pid  8645] sched_yield()               = 0
[pid  8645] futex(0x56527f09eeb0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  8645] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid  8755] futex(0xc00002e840, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  8645] <... nanosleep resumed> NULL) = 0
[pid  8645] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid  8755] <... futex resumed> )       = 1
[pid  8646] <... futex resumed> )       = 0
[pid  8646] epoll_pwait(4,  <unfinished ...>
[pid  8645] <... nanosleep resumed> NULL) = 0
[pid  8645] futex(0x56527f09ef90, FUTEX_WAIT_PRIVATE, 0, {tv_sec=60, tv_nsec=0} <unfinished ...>
[pid  8646] <... epoll_pwait resumed> [], 128, 0, NULL, 33555305593471377) = 0
[pid  8646] futex(0x56527f09ef90, FUTEX_WAKE_PRIVATE, 1) = 1
[pid  8645] <... futex resumed> )       = 0
[pid  8645] sched_yield()               = 0
[pid  8645] futex(0x56527f09eeb0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid  8755] futex(0x56527f0a2fc0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=30231115} <unfinished ...>
[pid  8646] futex(0x56527f09eeb0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid  8645] <... futex resumed> )       = 0
[pid  8646] epoll_pwait(4,  <unfinished ...>
[pid  8645] futex(0x56527f09eeb0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  8645] epoll_pwait(4, [], 128, 0, NULL, 33555305601673686) = 0
[pid  8645] nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>
[pid  8646] <... epoll_pwait resumed> [], 128, 0, NULL, 33555305600979186) = 0
[pid  8645] <... nanosleep resumed> NULL) = 0

What's going on here? I have not dived into the Matterircd code here yet, but will in a second.

42wim commented 5 years ago

so you just run the docker command above and it takes 100% CPU ? Nothing else happens, like connections to matterircd or so?

flexd commented 5 years ago

Matterircd works fine. I can connect to it, I am connected to Mattermost through it. It's not a very busy mattermost (like 10 channels and not a lot of activity). Seems like it's working as intended other than the high cpu usage

42wim commented 5 years ago

That's very strange, run with -debug and see if something is going on there. What mattermost version?

flexd commented 5 years ago

Mattermost Version: 5.5.1

time="2019-01-28T10:20:01Z" level=debug msg="WsReceiver event: model.WebSocketEvent{Event:"preferences_changed", Data:map[string]interface {}{"preferences":"[{\"user_id\":\"5dfb<redacted>\",\"category\":\"channel_approximate_│···············································································
view_time\",\"name\":\"\",\"value\":\"1548670801655\"}]"}, Broadcast:(*model.WebsocketBroadcast)(0xc000275800), Sequence:10382}" prefix=matterclient                                                                                         │···············································································
time="2019-01-28T10:20:05Z" level=debug msg="WsReceiver event: model.WebSocketEvent{Event:"channel_viewed", Data:map[string]interface {}{"channel_id":"<redacted>pwq68oxy"}, Broadcast:(*model.WebsocketBroadcast)(0xc000275840), Seq│···············································································
uence:10383}" prefix=matterclient                                                                                                                                                                                                            │···············································································
time="2019-01-28T10:20:24Z" level=debug msg="posting lastview "<redacted>"" prefix=matterclient                                                                                                                              │···············································································
time="2019-01-28T10:20:25Z" level=debug msg="WsReceiver event: model.WebSocketEvent{Event:"channel_viewed", Data:map[string]interface {}{"channel_id":"<redacted>68oxy"}, Broadcast:(*model.WebsocketBroadcast)(0xc000275b80), Seq│···············································································
uence:10384}" prefix=matterclient                                                                                                                                                                                                            │···············································································
time="2019-01-28T10:20:35Z" level=debug msg="WS PING" prefix=matterclient                                                                                                                                                                    │···············································································
time="2019-01-28T10:20:35Z" level=debug msg="sendWsRequest &model.WebSocketRequest{Seq:3386, Action:"ping", Data:map[string]interface {}(nil), Session:model.Session{Id:"", Token:"", CreateAt:0, ExpiresAt:0, LastActivityAt:0, UserId:"", D│···············································································
eviceId:"", Roles:"", IsOAuth:false, Props:model.StringMap(nil), TeamMembers:[]*model.TeamMember(nil)}, T:(i18n.TranslateFunc)(nil), Locale:""}" prefix=matterclient                                                                         │···············································································
time="2019-01-28T10:20:35Z" level=debug msg="WsReceiver response: model.WebSocketResponse{Status:"OK", SeqReply:3386, Data:map[string]interface {}{"server_time":1.548670855126e+12, "text":"pong", "version":"5.5.0", "node_id":""}, Error:(│···············································································
*model.AppError)(nil)}" prefix=matterclient                                                                                                                                                                                                  │···············································································
time="2019-01-28T10:20:35Z" level=debug msg="WS PONG received" prefix=matterclient                                                                                                                                                           │···············································································
Ftime="2019-01-28T10:20:46Z" level=debug msg="posting lastview "<redacted>xdqubnoe"" prefix=matterclient                                                                                                                             │···············································································
time="2019-01-28T10:20:47Z" level=debug msg="posting lastview "sh<redacted>68oxy"" prefix=matterclient                                                                                                                              │···············································································
time="2019-01-28T10:20:47Z" level=debug msg="WsReceiver event: model.WebSocketEvent{Event:"channel_viewed", Data:map[string]interface {}{"channel_id":"s<redacted>bfz5qpwq68oxy"}, Broadcast:(*model.WebsocketBroadcast)(0xc0000113c0), Seq│···············································································
uence:10385}" prefix=matterclient                                                                                                                                                                                                            │···············································································
time="2019-01-28T10:20:48Z" level=debug msg="WsReceiver event: model.WebSocketEvent{Event:"channel_viewed", Data:map[string]interface {}{"channel_id":"z1bdp<redacted>dbznkdxdqubnoe"}, Broadcast:(*model.WebsocketBroadcast)(0xc000011540), Seq│···············································································
uence:10386}" prefix=matterclient 

Doesn't look to be doing anything special.. Loglines aren't exactly flying by here, so no idea what is doing to use so much CPU.

strace shows it absolutely spamming this

read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0                                                                                                                                                                                                  │···············································································
read(7, "", 4096)                       = 0 

So I would guess somewhere there is a unlimited for {} loop reading a socket or something else

flexd commented 5 years ago

I am only connected to two teams (same server), and the message volume is not that high.

flexd commented 5 years ago

I suspect this is the culprit:

https://github.com/42wim/matterircd/blob/84eecb73c04635add932d30fb1f2b20be41452fc/mm-go-irckit/user.go#L195

flexd commented 5 years ago

I suspect this is the culprit:

matterircd/mm-go-irckit/user.go Line 195 in 84eecb7 msg, err := user.Conn.Decode()

That isn't it ^, and I have no idea why I thought so. Compiling matterircd myself and not running the docker container seems to have resolved the problem.

42wim commented 5 years ago

can't seem to reproduce this and as your problem is solved by not using docker, going to close this for now :)

pbui commented 5 years ago

I too am having the problem of 100% CPU usage, which adds a lot of latency to matterircd (built it from source, not using docker, only 3-5 users). Here is a snapshot from pprof:

Showing nodes accounting for 1.13s, 3.88% of 29.14s total
Dropped 29 nodes (cum <= 0.15s)
Showing top 5 nodes out of 32
      flat  flat%   sum%        cum   cum%
     0.03s   0.1%   0.1%     29.08s 99.79%  github.com/42wim/matterircd/mm-go-irckit.(*User).Decode
     0.30s  1.03%  1.13%        29s 99.52%  github.com/42wim/matterircd/vendor/github.com/sorcix/irc.(*Decoder).Decode
     0.16s  0.55%  1.68%     28.03s 96.19%  bufio.(*Reader).ReadString
     0.33s  1.13%  2.81%     27.83s 95.50%  bufio.(*Reader).ReadBytes
     0.31s  1.06%  3.88%     27.09s 92.96%  bufio.(*Reader).ReadSlice

The gateway still works, but it is very slow and bogs down the server.

pbui commented 5 years ago

Call Graph

pbui commented 5 years ago

Note, I had set PasteBufferTimeout = 0 because the latency was so bad and perhaps this exacerbated the problem.

I just rolled back to v0.18.4 and I am not noticing the same 100% CPU usage.