SimplyStaking / panic

PANIC Monitoring and Alerting For Blockchains
Apache License 2.0
81 stars 31 forks source link

Add SIGNL4 as Alerting Channel #1

Open rons4 opened 3 years ago

rons4 commented 3 years ago

Add the app-based alerting service SIGNL4 as additional alerting channel in PANIC.

I just talked to a SIGNL4 user that wanted to send alerts from PANIC to his SIGNL4 team. After some investigation I found this repository and wonder if you are open to add SIGNL4 as an additional alerting channel. If so, I would be happy to add a pull request for the same.

Thanks a lot

Ron

dillu24 commented 3 years ago

Hi @rons4 , sorry for the late reply.

First and foremost thanks for your interest in our project.

We would be happy to receive PRs for adding SIGNL4 as an alerting channel. The more we are able to cater for our users, the better :)

Please let us know if you have any questions regarding our code-base.

rons4 commented 3 years ago

Thanks @dillu24, that's good to hear and we will prepare the PR accordingly. I got the docker installation running already and we will go from there.

I will let you know if I should have any questions.

Thanks a lot Ron

rons4 commented 3 years ago

I am making some progress with the integration .... image

However, now indeed I could need some help regarding your code base.

As you can see from the screenshot above I successfully added SIGNL4 to the configuration page. Right now I have two questions:

In signl4Table.js and signl4Schema.js the variable signl4s seems to be undefined. I tried to do everything the same as for PagerDuty or OpsGenie, however, the variables pagerduties and opsgenies are valid.

Could you please point me to the code where these variables are initialized?

Also, I am still not to clear how the actual sending or alerts work. Do you do this in the Python part?

I keep finding out how this works and appreciate any assistance.

Thanks Ron

dillu24 commented 3 years ago

Hi @rons4

@VitalyVolozhinov can help you out with the .js parts.

With regards to alerting, yes that is done in the Python part. The SIGNL4 code needs to be added in the Channels Manager component here: https://github.com/SimplyVC/panic/tree/master/alerter/src/channels_manager

When it comes to the python part, I suggest first looking at the alerter design here: https://github.com/SimplyVC/panic/blob/master/docs/DESIGN_AND_FEATURES.md

VitalyV1337 commented 3 years ago

@rons4 Please create a PR so I can see the code.

rons4 commented 3 years ago

Thanks @dillu24, thanks @VitalyVolozhinov for your quick reply.

It is still work in progress so I did not create a PR yet. I forked my current status here: https://github.com/rons4/panic

If I still shall create a PR, just let me know.

I will dive deeper into the Python part as well now.

VitalyV1337 commented 3 years ago

Thank you, I will look into it by the end of this week.

VitalyV1337 commented 3 years ago

From a glance you didn't add the SIGN4 reducer to the index.js https://github.com/rons4/panic/blob/master/web-installer/src/redux/reducers/index.js will look into it later on

rons4 commented 3 years ago

Thanks @VitalyVolozhinov, yes, this was the missing piece. Sorry, I overlooked this one and will not look more into the sending (Python part). Thanks again for your help and I will get back to you if I should get stuck again.

VitalyV1337 commented 3 years ago

No problem :)

rons4 commented 3 years ago

The frontend and configuration part seems to work fine now.

However, when I send a test message I get this: Box.js:11 POST https://localhost:8000/server/signl4/test net::ERR_CONNECTION_REFUSED

It might be my overall setup since I can neither send Opsgenie messages: Box.js:11 POST https://localhost:8000/server/opsgenie/test net::ERR_EMPTY_RESPONSE

I can send Telegram test messages but this seems to be implemented in the JavaScript part and not in Python.

Any idea about what might be wrong here is greatly appreciated. I used the standard Docker setup using this command: docker-compose up -d --build

Thanks again.

VitalyV1337 commented 3 years ago

The server uses credentials found in the .env, best way to test is through the front-end after logging in.

rons4 commented 3 years ago

Thank you for your quick reply. I did that. I log on to the frontend, configure SIGNL4, Telegram or Opsgenie and then send a Test Message from there. I get the responses: Box.js:11 POST https://localhost:8000/server/signl4/test net::ERR_CONNECTION_REFUSED or Box.js:11 POST https://localhost:8000/server/opsgenie/test net::ERR_EMPTY_RESPONSE

In the developer settings / console of the browser.

VitalyV1337 commented 3 years ago

Please push your current changes to the branch so that I can test it out.

rons4 commented 3 years ago

Thanks and it is all there already: https://github.com/rons4/panic

My feeling is that it might be something related to my setup because also the Opsgenie test alert does not work. Once this works and I know what is called in the Python part, I can start debugging the SIGNL4 implementation there.

VitalyV1337 commented 3 years ago

Change the function in the server.js to this one:

// This endpoint triggers a test alert event to the SIGNL4 space.
app.post('/server/signl4/test', verify, async (req, res) => {
  console.log('Received POST request for %s', req.url);
  const { teamSecret } = req.body;
  // Check if teamSecret is missing.
  const missingParamsList = utils.missingValues({ teamSecret });

  // If some required parameters are missing inform the user.
  if (missingParamsList.length !== 0) {
    const err = new errors.MissingArguments(missingParamsList);
    res.status(err.code).send(utils.errorJson(err.message));
    return;
  }

  // The SIGNL4 webhook URL
  const signl4URL = `https://connect.signl4.com/webhook/${teamSecret}`;

  axios
    .post(signl4URL, { params: {} })
    .then((_) => {
      const msg = new msgs.TestAlertSubmitted();
      res.status(utils.SUCCESS_STATUS).send(utils.resultJson(msg.message));
    })
    .catch((err) => {
      console.error(err);
      if (err.code === 'ECONNREFUSED') {
        const msg = new msgs.MessageNoConnection();
        res.status(utils.ERR_STATUS).send(utils.errorJson(msg.message));
      } else {
        const msg = new msgs.ConnectionError();
        // Connection made but error occurred (typically means node is missing
        // or prometheus is not enabled)
        res.status(utils.ERR_STATUS).send(utils.errorJson(msg.message));
      }
    });
});
rons4 commented 3 years ago

Thanks again and yes, this works now.

Now, when I click Finish in the Setup Completed! box I get this error: Box.js:11 POST https://localhost:8000/server/config?configType=channel&fileName=signl4_config.ini&chainName=&baseChain= 433 (unknown)

Obviously the SIGNL4 config cannot be saved somehow.

Also, what's the best way to simulate an alert without having any monitoring tool? That would be my next step after being able to save since I still did not test the Python part at all.

Thanks again for any hints. The code is in my branch.

dillu24 commented 3 years ago

I'll leave the installer part to @VitalyVolozhinov . With regards to alerting, the best way is to generate a test alert via the installer when setting up the channel. Apart from that, the alerter component (python part) is the only component which generates alerts, and it does so based on alert rules set during the installation procedure.

VitalyV1337 commented 3 years ago

@rons4 try adding this to server/config.js

const USER_CONFIG_SIGNL4 = 'signl4_config.ini';

and update ALL_CHANNELS_CONFIG_FILES to be like this

const ALL_CHANNELS_CONFIG_FILES = [
  USER_CONFIG_TELEGRAM, USER_CONFIG_EMAIL, USER_CONFIG_TWILIO,
  USER_CONFIG_PAGERDUTY, USER_CONFIG_OPSGENIE, USER_CONFIG_SIGNL4,
];
rons4 commented 3 years ago

Hello @dillu24, Hello @VitalyVolozhinov, thanks a lot again and saving the config works fine now.

Also, the SIGNL4 test alert from the frontend works fine.

However, I still wonder how the Python part comes in (and how to test it). From my perspective it seems the test message I can send from the config page is all handled in the Node.js part.

Is there an easy way I can test the Python part for "real" alerting? Sorry, if I should have missed something here.

If this is tested I will go ahead with the PR.

dillu24 commented 3 years ago

So let me give some context:

Installer (JS): The job of the installer is to provide an easy walkthrough procedure to set-up monitoring and alerting for a chain. The installer outputs a set of configuration files including some alert rules. Let's say we save the following alert rule, that if the system RAM usage is above 95% we get a critical alert.

Alerter (Python): The job of the alerter is to gather metrics from the nodes, repos, smart contracts and compare these metrics with the alerting thresholds stored inside the configuration. If some alerting rules are met, the alerter will generate alerts. Therefore in our example, if the RAM usage suddenly becomes 100%, a critical alert is sent to the critical channels associated with that chain.

So, to generate an alert from the python part I suggest first connecting a node and check it's ram usage. If say the ram usage is 45 percent, put the critical threshold to 40 so that a critical alert is sent.

Now with regards to development what remains is to integrate SIGNL4 as a channel in the Python part. That can be done in the Channels Manager component (a similar channel and channel handler must be developed). You can look at the implementation of OpsGenie or any other channel as an example.

rons4 commented 3 years ago

Thank you for your clarification @dillu24.

Since I have no other monitoring option at hand I tried the GitHub monitor and I monitor my repository. When changes occur I don't see anything happening (e.g. in the logs). It might well be that I just don't know where to look or how to debug the Python code.

PS: By the way, I entered "rons4/panic" as repository name but the according test failed. In the log I saw a missing character and I entered "rons4/panicc". This worked then. It seems the last character will be cut.

dillu24 commented 3 years ago

Repo names are normally expected in the following format "rons4/panic/". @VitalyVolozhinov could this potentially be a bug in the installer or alerter?

rons4 commented 3 years ago

Ah, OK, got it ;-)

VitalyV1337 commented 3 years ago

The forms in the installer state that there must be a trailing slash. Regex verification will be added later on.

rons4 commented 3 years ago

I still cannot simulate a a real alert, neither for SIGNL4 nor for Telegram.

You you have any test system or configuration I could use in the chains configuration that will trigger an alert? I tried to enter face data but could not get any alert. Sorry, I am not too familiar with this part.

Thanks again.

VitalyV1337 commented 3 years ago

@rons4 I've noticed you are working on the master branch, you should switch to develop to get the most updated version of PANIC.

rons4 commented 3 years ago

Oh no ;-)

OK, let me switch, apply the changes and see how ot goes. I keep you posted.

rons4 commented 3 years ago

OK, I merged the changes and the development branch now: https://github.com/rons4/panic

The configuration part seems to work. I can also send the test messages successfully. However, the config is not saved although it says success. When I try to load the config it says the file might be corrupted. Maybe because I still have the format of the previous version.

Very nice new portal by the way ;-)

rons4 commented 3 years ago

OK, I created a PR now and I hope all is fine. I still have difficulties with the Python part and to test it with real monitoring alerts. When I configure the monitoring I get a black page when clicking Next. Not sure it this is related to the development branch or if I did something wrong in the configuration. I used some fake settings. Please let me know if I should change anything and thanks a lot for all your assistance.

VitalyV1337 commented 3 years ago

I will look at your PR. Thank you for your submission.

rons4 commented 3 years ago

Thanks @VitalyVolozhinov.