bro-n-bro / cybernode

🧠 Provider for the Great Web.
https://cybernode.ai
Other
30 stars 11 forks source link

Monitoring metrics for cybernode #12

Closed abitrolly closed 6 years ago

abitrolly commented 6 years ago

Related issues:

Story A - healthcheck for cybernode

As a user, I want to see that my cybernode is healthy, and if it is not, then see the reason why.

Story B - cybernode monitoring

As an developer/contributor, I want to see how cybernode works. I want to see what is it doing, if there are any bottlenecks or anomalies. If the node synchronized with other nodes.

Design Considerations

For people who want a simple status, looking at page with all bells&whistles is not fun. It is possible to design fluid SVG interface (Lottie?) that may contain the whole cybernode blueprint with all moving components, and color each component according to its status. SCADA system on Lottie.

But before we get there, we may use Prometheus+Grafana for all sorts of required info, and we should hide advanced options.

Data and processes

List of processes that cybernode is doing:

Story C - cybernode sanity

As a "business" owner I want to be absolutely sure that cybernode is sane and is giving the latest available information to make "business" decisions. That includes stats like if we don't get expected block in time, if there is something with network, and it should be visible somehow on the main page.

Story D - cybernode tamagochi

cybernode is likely not the only process running on the system, so it would be nice to see how much does it "cost" to run certain components. Before we can tell that, we need to collect that stats. ... saving ...

abitrolly commented 6 years ago

Stats per component like how much CPU it is consuming, how much memory it eats, does memory usage grow over time? Also, what achievements (CPU, mem, hard) I need to get to add new components (features/abilities).

Story E - backend ops

As an application developer, I want to know how much time user spends in app about backend errors that are occurring for specific user requests. Get events that something is failed or crashed. I need to know if the node went down and when it was down, and while it is down, if people are made changes on client app.

For example, I track some user level events only when backend is working. When backend is down, tx is coming, but we don't catch it and can't say to app. When connection is restored, we resync and we may miss event when tx comes to mempool AND THEN to block - we only get tx in block.

As a developer, I also want to trace speed of requests and various components that add to final lag like speed of DB access. ... saving ...

abitrolly commented 6 years ago

Metrics

Indexation process.

abitrolly commented 6 years ago
abitrolly commented 6 years ago
mastercyb commented 6 years ago

Block height of every blockchain node is the most needed thing to start from

hleb-albau commented 6 years ago

@abitrolly Could you, pleasae, check our lates monitoring service: http://monitoring.cybersearch.io/d/94l_L2Nmz/elassandra-monitoring?refresh=1m&orgId=1 http://monitoring.cybersearch.io/dashboards

hleb-albau commented 6 years ago

not active, closed.