Open Inphi opened 1 day ago
Prometheus isn't very good at pulling metrics from short lived processes like cannon. The normal pull model assumes that it can just periodically request the metrics from a long running server. There is a push-gateway to allow things like batch jobs push metrics when they do run and cannon could use that but I'm not sure what support we have for that in grafana cloud. Ultimately that's why DebugInfo
was introduced to report the memory usage.
We need to collect richer metrics for threading related behavior. Including steps between ll/sc instructions, time spent between context switches, etc. These metrics are collected during a VM run and it'll be ideal to ship them over to prometheus as soon as they're collected.
The alternative is to keep these metrics in memory and write them out to
DebugInfo
. However, this may create really large debug files for the op-challenger to ingest.