fraschetti / Octoslack

OctoPrint plugin for Slack, Mattermost, Pushbullet, Pushover, Rocket.Chat, Discord, Riot/Matrix, & Microsoft Teams
MIT License
74 stars 34 forks source link

high CPU load caused by very quick reconnection attempts #44

Closed hakan42 closed 5 years ago

hakan42 commented 5 years ago

I have just updated from 1.6.2 to 1.7.0. The 1.6.2 version is working correctly.

With 1.7.0, octoprint.log is flooded with reconnection attempts, which causes the CPU load to spike and makes printing impossible:

2019-01-26 13:33:48,853 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:48,893 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:48,912 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:48,913 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:48,914 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:48,914 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:48,915 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:48,945 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:49,049 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
2019-01-26 13:33:49,050 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): math range error
fraschetti commented 5 years ago

Hi @hakan42

I've found where the issue is happening and have a bit of a fix but it's not completely obvious why that issue would have even happened in the first place. I've published a 1.7.1 which you can pick up (the OctoPrint plugin manager's Advanced section will let you force an update check instead of waiting the usual 24 hours) if you want to see if it fixes your issue. If you do have the time, I'd appreciate it if you could gather some diagnostic infomration before you upgrade.

  1. Enable DEBUG logging
  2. Reproduce the issue
  3. Send Octoprint.log

Instructions to enable DEBUG logging for Octoslack

hakan42 commented 5 years ago

I installed 1.7.0 again on my machine and enabled debugging. The most obvious thing I can see is the following:

2019-01-27 15:05:12,082 - octoprint.plugins.Octoslack - DEBUG - Attempting to connect Slack RTM API (iteration=46)
2019-01-27 15:05:12,084 - octoprint.plugins.Octoslack - DEBUG - Sleeping for 300 seconds before attempting connection
2019-01-27 15:05:12,086 - octoprint.plugins.Octoslack - ERROR - Error Slack RTM API connection error (Exception): global name 'wait' is not defined

which runs in a very fast loop.

With 1.7.1, now I can't connect at all, the following is from an updated (and freshly started) octoprint:

2019-01-27 15:21:20,511 - octoprint.plugins.Octoslack - ERROR - Failed to reconnect via Slack RTM API
2019-01-27 15:21:20,512 - octoprint.plugins.Octoslack - DEBUG - Attempting to connect Slack RTM API (iteration=2)
2019-01-27 15:21:20,513 - octoprint.plugins.Octoslack - DEBUG - Sleeping for 20 seconds before attempting connection
hakan42 commented 5 years ago

I have to correct myself, sometimes the connections are successfull, but every few times, I get the "failed to reconnect". I cannot see any pattern yet.

fraschetti commented 5 years ago

Hi @hakan42 I've made a few more changes that I believe should help - prevent infinite loops & be resistant to additional failure modes. These changes are baking on a RasPi and Windows install on OctoPrint before I push a new release. I'll reply here once I have more information to share.

fraschetti commented 5 years ago

Hi @hakan42 I've realeased 1.7.2 which contains a fix for the reconnection delay logic and some additional logic to prevent getting stuck in error/exception loops. Please feel free to reopen this issue if you see the problem again.