Joystream / community-repo

A repo for community contribution and documentation
GNU General Public License v3.0
19 stars 73 forks source link

Storage Providers health probe + Discord notifications #800

Closed singulart closed 2 years ago

singulart commented 2 years ago

Context

Requested by Storage WG Lead @0x2bc as a must-have for mainnet.

Scope

Discord bot that pings storage providers /storage/api/v1/state/data endpoint. Client timeout = 250ms (configurable) Failure to receive the HTTP 200 within the specified interval yields a notification in #storage-providers channel.

To avoid spamming the channel, the following flow is suggested when the failing node is detected.

  1. Record the failing node endpoint in a DB
  2. Schedule notifications every 15 minutes (configurable) that scans the DB record and sends 1 "summary" notification for all failing nodes.
  3. If node gets back up, remove the DB record.

Example notification

@Storage Worker Failing node(s) alert: 
1. http://cutieblockchains.com
2. http://cutieblockchains.com

This functionality needs to be added to the existing codebase: https://github.com/singulart/joy-disco-bots

Estimate

6-8h

singulart commented 2 years ago

Impediment: https://github.com/Joystream/joystream/issues/4032

singulart commented 2 years ago

Had to move forward despite the impediment

singulart commented 2 years ago

@oleksanderkorn @freakstatic @0x2bc PR ready feel free to review