Closed abitrolly closed 6 years ago
Stats per component like how much CPU it is consuming, how much memory it eats, does memory usage grow over time? Also, what achievements (CPU, mem, hard) I need to get to add new components (features/abilities).
As an application developer, I want to know how much time user about backend errors that are occurring for specific user requests. Get events that something is failed or crashed. I need to know if the node went down and when it was down, and while it is down, if people are made changes on client app.
For example, I track some user level events only when backend is working. When backend is down, tx is coming, but we don't catch it and can't say to app. When connection is restored, we resync and we may miss event when tx comes to mempool AND THEN to block - we only get tx in block.
As a developer, I also want to trace speed of requests and various components that add to final lag like speed of DB access. ... saving ...
Indexation process.
Block height of every blockchain node is the most needed thing to start from
@abitrolly Could you, pleasae, check our lates monitoring service: http://monitoring.cybersearch.io/d/94l_L2Nmz/elassandra-monitoring?refresh=1m&orgId=1 http://monitoring.cybersearch.io/dashboards
not active, closed.
Related issues:
Story A - healthcheck for cybernode
As a user, I want to see that my cybernode is healthy, and if it is not, then see the reason why.
Story B - cybernode monitoring
As an developer/contributor, I want to see how cybernode works. I want to see what is it doing, if there are any bottlenecks or anomalies. If the node synchronized with other nodes.
Design Considerations
For people who want a simple status, looking at page with all bells&whistles is not fun. It is possible to design fluid SVG interface (Lottie?) that may contain the whole
cybernode
blueprint with all moving components, and color each component according to its status. SCADA system on Lottie.But before we get there, we may use Prometheus+Grafana for all sorts of required info, and we should hide advanced options.
Data and processes
List of processes that cybernode is doing:
Story C - cybernode sanity
As a "business" owner I want to be absolutely sure that cybernode is sane and is giving the latest available information to make "business" decisions. That includes stats like if we don't get expected block in time, if there is something with network, and it should be visible somehow on the main page.
Story D - cybernode tamagochi
cybernode is likely not the only process running on the system, so it would be nice to see how much does it "cost" to run certain components. Before we can tell that, we need to collect that stats. ... saving ...