Currently, we process the extensive data in slogs using offline tools such as jq or custom scripts. However, adopting a method similar to the otel/tracing could enable us to extract valuable metrics directly from slogs and send them to monitoring services like Datadog.
We can focus on extracting two types of metrics:
Pre-existing Backend Metrics: Specifically from the cosmic-swingset-after-commit-stats type slog entry, which includes statistics like forcedGc, memoryUsage, and heapStats.
Calculated Performance Metrics: Derived from various events, such as the duration it takes to complete a crank, contract call durations, syscall durations, etc.
This approach would streamline our monitoring process and improve the real-time visibility of our system's performance.
What is the Problem Being Solved?
Currently, we process the extensive data in slogs using offline tools such as jq or custom scripts. However, adopting a method similar to the otel/tracing could enable us to extract valuable metrics directly from slogs and send them to monitoring services like Datadog.
We can focus on extracting two types of metrics:
cosmic-swingset-after-commit-stats
type slog entry, which includes statistics like forcedGc, memoryUsage, and heapStats.Description of the Design
Security Considerations
Scaling Considerations
Test Plan
Upgrade Considerations