FOSSRIT / infrastructure

Set of scripts, Ansible playbooks/roles, and other tools to automate and manage FOSS@MAGIC infrastructure
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Service disruption: Matterbridge went down with odd error #58

Closed Tjzabel closed 4 years ago

Tjzabel commented 5 years ago

Summary

I had to restart the slack bridge this afternoon. Matterbridge went offline this morning with an odd error:

Sep 09 22:41:04 ritlug-irc matterbridge[7829]: time="2019-09-09T22:41:04-04:00" level=info msg="irc.freenode: joining #rit (ID: #ritirc.freenode)" prefix=irc
Sep 09 22:41:05 ritlug-irc matterbridge[7829]: time="2019-09-09T22:41:05-04:00" level=info msg="irc.freenode: joining #rit-foss (ID: #rit-fossirc.freenode)" prefix=irc
Sep 09 22:41:06 ritlug-irc matterbridge[7829]: time="2019-09-09T22:41:06-04:00" level=info msg="irc.freenode: joining #rit-lug-projects (ID: #rit-lug-projectsirc.freenode)" prefix=irc
Sep 09 22:41:08 ritlug-irc matterbridge[7829]: time="2019-09-09T22:41:08-04:00" level=info msg="irc.freenode: joining #rit-lug-sysadmin (ID: #rit-lug-sysadminirc.freenode)" prefix=irc
Sep 09 23:08:12 ritlug-irc matterbridge[7829]: time="2019-09-09T23:08:12-04:00" level=error msg="Connection failed "slack rate limit exceeded, retry after 59s" &slack.RateLimitedError{RetryAfter:59000000000}" prefix=slack
Sep 09 23:08:12 ritlug-irc matterbridge[7829]: time="2019-09-09T23:08:12-04:00" level=error msg="Connection failed "slack rate limit exceeded, retry after 59s" &slack.RateLimitedError{RetryAfter:59000000000}" prefix=slack
Sep 09 23:26:55 ritlug-irc matterbridge[7829]: time="2019-09-09T23:26:55-04:00" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
Sep 09 23:26:55 ritlug-irc matterbridge[7829]: time="2019-09-09T23:26:55-04:00" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
Sep 10 07:43:01 ritlug-irc matterbridge[7829]: time="2019-09-10T07:43:01-04:00" level=error msg="Connection failed "slack rate limit exceeded, retry after 59s" &slack.RateLimitedError{RetryAfter:59000000000}" prefix=slack
Sep 10 07:43:01 ritlug-irc matterbridge[7829]: time="2019-09-10T07:43:01-04:00" level=error msg="Connection failed "slack rate limit exceeded, retry after 59s" &slack.RateLimitedError{RetryAfter:59000000000}" prefix=slack

Expected results

  1. Matterbridge is online
  2. I send a message on Slack
  3. I see it on IRC

Actual results

  1. Matterbridge is offline
  2. I send a message on Slack
  3. I don't see the message on IRC

Priority requested

jwflory commented 4 years ago

@Tjzabel I'm going to close this since v1.16.1 was deployed in #59. If you notice the issue again, please reopen.

fshofmann commented 4 years ago

Hey there, I had similar issue recently. I recently checked the log after running the bridge for 10 days.

`time="2020-07-15T00:00:14Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T00:00:14Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T01:00:29Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T01:00:29Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T06:54:56Z" level=info msg="Large slack detected > 2000 users, skipping loading complete userlist." prefix=slack
time="2020-07-15T07:00:34Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T07:00:34Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T08:00:26Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T08:00:26Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T11:00:04Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T11:00:04Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T13:00:26Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T13:00:26Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T15:07:14Z" level=info msg="Large slack detected > 2000 users, skipping loading complete userlist." prefix=slack
time="2020-07-15T16:00:38Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T16:00:38Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T19:00:12Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T19:00:12Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-15T23:17:40Z" level=error msg="Could not retrieve users: slack.statusCodeError{Code:500, Status:"500 Internal Server Error"}" prefix=slack
time="2020-07-16T00:00:20Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-16T00:00:20Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-16T01:00:04Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-16T01:00:04Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-16T07:00:11Z" level=error msg="Could not retrieve bot information: &errors.errorString{s:"bot_not_found"}" prefix=slack
time="2020-07-16T07:00:11Z" level=error msg="&errors.errorString{s:"bot_not_found"}" prefix=slack

This was considered "Normal" for us, besides the weird log files everything seemed normal - no issues for the slack-discord gateway, but then tomorrow I encountered the issue by OP (discord was still getting relayed to slack, but not the other way around). `