algorandfoundation / grow-algorand

Grow Algorand and Earn ALGOs ❤️
89 stars 33 forks source link

Connect Algorand node to the ELK stack #38

Open michielmulders opened 3 years ago

michielmulders commented 3 years ago

Overview

Description

What is this task?

Create a practical tutorial with working configuration files that demonstrates how to connect and use ELK stack with an Algorand node for monitoring purposes.

This includes:

What are the requirements for the bounty taker?

The bounty taker should have the knowledge of how to run a node, read and understand its logs. In addition, experience in setting up and operating Elasticsearch, Logstash and Kibana is required.

What are the deliverables?

1. GitHub project

A GitHub project with all necessary configuration files and the exported Kibana dashboard.

Judging Criteria and Metrics

Submission Procedure

2. Tutorial

Submit a tutorial describing all steps on the Algorand Developer Portal.

Judging Criteria and Metrics

Submission Procedure

Submit your blog post following these steps:

Other Requirements

For questions, reach out to Algorand on Discord.

gitcoinbot commented 3 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 850.0 ALGO (875.5 USD @ $1.03/ALGO) attached to it as part of the algorandfoundation fund.

michielmulders commented 3 years ago

Application from @dozham: https://github.com/algorandfoundation/grow-algorand/issues/32

gitcoinbot commented 3 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 265 years, 4 months from now. Please review their action plans below:

1) dozham has applied to start work _(Funders only: approve worker | reject worker)_.

I am still interested in doing the task and in progress of gaining deeper Algorand knowledge, so I reapplied as you recommended (due to the movement of issue). 2) kshays has applied to start work _(Funders only: approve worker | reject worker)_.

Part 1 - Prepare logstash configs : Create a standalone process which will hold logstash conf for aggregating logs from all algo nodes, applying a few filters (if needed) and forward it to elastic search (ES). Since the node logs are already in JSON format, it can be persisted as is in a doc store like ES (need to auto gen doc id though).

Part 2 - Once the logs are available in ES, Identify the fields in log entries required to create metrics for Kibana dashboards. Rules for creating common metrics in dashboard can also be stored in the new Git project and can be imported directly to Kibana instance for quick setup.

Part 3- Write blog to setup APM for Algo nodes using ELK in 15 min. 3) stylishsquid has started work.

Proposal:

  1. Build out an ELK stack.
  2. Setup a Algo node.
  3. Ingest Algo logs: Filebeat -> Logstash -> ES
  4. Configure Logstash filter to parse out and enrich log data. This will make it easier to build Kibana dashboard.
  5. Configure Kibana, and build out a dashboard. What can be done in the dashboard is dependent on what is printed out in the logs. From what I have seen in the logs, some metrics that we could include are:
  1. Documentation/Blog 6.1. For the setup of Algo node and ELK stack would be references to other blog posts out there, or maybe create a condensed version of those instructions here. 6.2. Break down the types of Algo logs, and what each part of the log indicates. This will help in building Logstash filter. 6.3. How to write up a Logstash filter to enrich Algo logs. 6.4. How to build a Kibana dashboard. This will bring together step 6.2 and 6.3, or understanding the logs and enriching them. 6.5. Showing other examples of how to query Algo logs.

Experience:

I have a lot of experience with running infrastructure and platform, and the monitoring and tooling around that, especially as it relates to ELK. But in terms of my Algo experience, I am a beginner. I have setup a node locally and have generated logs from MainNet. I have been using carpenter to get more context, and have been reading up on the docs to better understand the steps in consensus protocol. But ultimately would need some help with fully understanding the log statements, what can be parsed, and consequently displayed in the dashboard. I would engage the discord community to answer my questions as I learn.

Learn more on the Gitcoin Issue Details page.

michielmulders commented 3 years ago

@stylishsquid I've approved you for this bounty! If you need help, please join our Discord and join one of the network/node related channels if you have questions about what metrics matter or what's needed. There's plenty of people that can help you if you need some clarifications. Please keep me posted on your progress in this GitHub thread! Thanks!

michielmulders commented 3 years ago

@stylishsquid - Let us know your progress on this bounty? Thanks! :)

c0d5x commented 3 years ago

I just found this bounty and I want to participate. What metrics have to be included in the Kibana dashboard? Is there a spec for the end result? Also, this is for non-relay nodes? Thanks for your response

michielmulders commented 3 years ago

@stylishsquid Are you still working on the bounty or can we reassign this one? Thanks!

ori-shem-tov commented 3 years ago

I just found this bounty and I want to participate. What metrics have to be included in the Kibana dashboard? Is there a spec for the end result? Also, this is for non-relay nodes? Thanks for your response

@c0d5x You can find some examples of metrics in the bounty description and in StylishSquid proposal. This should work for both relay and non-relay nodes.

StylishSquid commented 3 years ago

Hey @michielmulders. Sorry for the delay in responding. Weird timing with when this got assigned and when I started traveling. Just got back and have more time this week.

Current Update:

michielmulders commented 3 years ago

Ok cool, keep us posted!

michielmulders commented 2 years ago

@StylishSquid Can you share a status update? :) Cheers!

fleeingDeer commented 2 years ago

Hi @michielmulders is this issue still open? Can someone else work on this project for the bounty? Thanks!

michielmulders commented 2 years ago

@StylishSquid Do you think you'll be able to finish this in 2 weeks? If not, I would rather pass it on to other bounty hunters like @fleeingDeer

StylishSquid commented 2 years ago

Hey Micheal. Please pass it onto @fleeingDeer. Apologies for the late response.

On Sun, Aug 1, 2021 at 10:17 AM Michiel Mulders @.***> wrote:

@StylishSquid https://github.com/StylishSquid Do you think you'll be able to finish this in 2 weeks? If not, I would rather pass it on to other bounty hunters like @fleeingDeer https://github.com/fleeingDeer

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/algorandfoundation/grow-algorand/issues/38#issuecomment-890528983, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATHAWPN2UYOUAMK6CEUIXHTT2VJP5ANCNFSM46QYASHA .

michielmulders commented 2 years ago

@fleeingDeer What's your experience with Algorand? @c0d5x was first to reapply for this bounty a while ago if they're still up for it. Let me know both of you! Cheers

fleeingDeer commented 2 years ago

@michielmulders just like @StylishSquid i have experience with the ELK stack. I have been reading the Algorand docs, I would say beginner. If @c0d5x chooses to do the bounty as they were first that’s okay with me :)

michielmulders commented 2 years ago

Let's see, @c0d5x would you like to start on it? Otherwise, you can even decide to work together if you wish to. We can award up to 1000 ALGO for a well-functioning solution!

fleeingDeer commented 2 years ago

@michielmulders quick Q. Is there full creative freedom for this project? Can we automate the deployment of the ELK stack using a cloud env. + terraform?

Thanks!

michielmulders commented 2 years ago

@fleeingDeer The goal of this project is to create a ready-made template for other node runners so they can quickly set up the ELK stack with the included dashboard being deployed and connected to their node to get the stats. So, yes, we expect you to deliver some sort of recipe to spin everything up at once without too much hassle for node runners. Does that make sense? :)

fleeingDeer commented 2 years ago

@michielmulders sounds good, can you please reach out when this issue is assigned to me or co-assigned to @c0d5x and myself? Thanks!

michielmulders commented 2 years ago

@danmurphy1217 Thank you for that submission but it does not include the Kibana visualizations and automatic set up of this dashboard. Maybe you can coordinate with @fleeingDeer to complete the project.

@fleeingDeer you have to apply to the issue for me to approve you: https://gitcoin.co/issue/algorandfoundation/grow-algorand/38/100025861

danmurphy1217 commented 2 years ago

Sure, I can collab with @fleeingDeer or try and setup the automatic creation of the dashboards tonight. Happy to do either.

danmurphy1217 commented 2 years ago

Hi @michielmulders, I went back in and made a bunch of improvements:

  1. Make commands that can be used to automatically download, configure, and run the elasticsearch and kibana servers.
  2. Make command to run the node script for populating elasticsearch- all that has to be provided is the URL or file path and the mappings for the data
  3. ndjson export of the dashboard, which can be imported in ~1 minute to kibana.

Here is a video walkthrough of the entire process. It took me less than 5 minutes to download and configure elasticsearch and kibana, upload data to elasticsearch (over 300k data points), and import the dashboard into kibana. I'm sure there's still room for improvement, but hopefully this is close to what y'all are looking for!

I will push the code to main now. Let me know if you have any feedback.

michielmulders commented 2 years ago

@danmurphy1217 that looks great! I think the final step would be to show more detailed metrics:

General -- Log severity metrics -- Network issues (disconnect, reconnect, etc)

Consensus: -- Number of rounds, Block proposals, soft votes, certified votes -- Number of proposal or votes accepted and rejected. -- Number of times the node has been selected as a leader or committee member. Percent of time selected in lottery. -- Average amount of weight (Algo) per committee

Transactions -- Not sure yet what could be shown based on the logs. (amount spent? top addresses that sent the most transactions? ...)

danmurphy1217 commented 2 years ago

Sounds good, could you offer any guidance around where to find these data points in the log files? @michielmulders

michielmulders commented 2 years ago

@danmurphy1217 This might be helpful: https://developer.algorand.org/docs/reference/node/artifacts/ Other than that, you can find a lot of information from just running the node on the main or testnet to see what data gets printed.

danmurphy1217 commented 2 years ago

Thanks for sending- I was able to get some useful info from the node.log file of my node dir. Here's what I added to the dashboard:

  1. Top Transaction Types (Bar Chart)
  2. Total Weight per Transaction (Pie Chart)
  3. Unique Sender Addresses (Metric)
  4. Top Sender Addresses (Bar Chart)
  5. Minimum and Maximum of Round (Metric)

I put that dashboard inkibana_algorand_db.ndjson in the github repo. Hope this is helpful!

michielmulders commented 2 years ago

That's a great start, can you also look at the consensus and network related metrics? @danmurphy1217

General
-- Log severity metrics
-- Network issues (disconnect, reconnect, etc)

Consensus:
-- Number of rounds, Block proposals, soft votes, certified votes
-- Number of proposal or votes accepted and rejected.
-- Number of times the node has been selected as a leader or committee member. Percent of time selected in lottery.
-- Average amount of weight (Algo) per committee
danmurphy1217 commented 2 years ago

Yes, I can. I tried to incorporate this data into the last dashboard, but I didn't see it in the log files for the node I am running. I can look at log severity metrics and network issues with algoh, but I'm not really sure where the consensus data resides. Could you offer any specific guidance for this? Thank you!

michielmulders commented 2 years ago

@danmurphy1217 can you reach out in the Discord forum in the channel run-a-node and/or governance. Many people are running nodes and can help you with providing specific data or even their log history so you can try out your governance dashboard on top of that data (pretty cool to see how that works out). Let me know if that helps?

danmurphy1217 commented 2 years ago

I added in the following things to another dashboard (kibana_node_consensus_and_governance.ndjson).

General -- Network issues (disconnect, reconnect, etc)

Consensus: -- Number of rounds -- Number of proposal or votes accepted -- Number of proposal or votes dropped -- Average amount of weight (Algo) per committee

This should cover nearly 100% of what this task was asking for. The metrics endpoint was the only info I got from Discord, which helped me get these data points.

Let me know if you have any other questions, and thanks for letting me take on this project.

michielmulders commented 2 years ago

@ori-shem-tov Do you have any comments on this? :) If not, @danmurphy1217 can start on the blog post to explain how to use it and quickly set it up for node runners!

michielmulders commented 2 years ago

@danmurphy1217 This is the feedback I got from the team:

It looks like he didn’t use Logstash to ingest the logs like we asked, instead he’s using his own JS code to write the logs to the DB which could be ok if everything works. Besides that, he’s taking data about votes from the metrics endpoint which is not specific for the node but a total for the entire network.

Can you check if you are using the metrics endpoint for this? This doesn't provide you with the correct data.

ajgrande924 commented 2 years ago

@michielmulders I saw this bounty on gitcoin is it possible for me to start work on this or is someone currently working on this?

michielmulders commented 2 years ago

@ajgrande924 I think you can work on this unless @danmurphy1217 is still on this one?

gitcoinbot commented 2 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


The funding of 850.0 ALGO (1385.50 USD @ $1.74/ALGO) attached to this issue has been cancelled by the bounty submitter

latonis commented 2 years ago

Hi @michielmulders, is this still available to work on? Also, is this sill eligible for funding as well?

michielmulders commented 2 years ago

It's paused for now!