-
We should track host-level metrics such as memory usage, disk usage, cpu usage, etc and combine that with the existing block processing time data `replayor` is already collecting to give a more holist…
-
### Image name
metrics-proxy
### Short description
A lightweight proxy designed to expose a unified metrics endpoint for multiple Kubernetes pods.
### Image repository
https://github.com/canonica…
-
The following paper by the FAIR Metrics Group can also be relevant: [https://www.nature.com/articles/sdata2018118](https://www.nature.com/articles/sdata2018118)
Here is the GitHub repository of the g…
-
# Current
Currently, our unit tests for the main model training code ultimately rely on diffs between the files.
# Objective
On top of (not replacing) the diff tests, we should also add tests to…
-
**What would you like to be added**:
Expose metrics about JobSet resource: https://jobset.sigs.k8s.io/docs/reference/jobset.v1alpha2/
**Why is this needed**:
For better monitoring of AI t…
-
`cockroachdb.changefeed.internal_retry_message` is a gauge, but it should be a counter. This causes the datadog integration to be annoying (have to use derivative instead of native rate, and no `count…
-
-
### Area(s)
area:k8s
### Is your change request related to a problem? Please describe.
Part of https://github.com/open-telemetry/semantic-conventions/issues/1032.
This issue is for adding …
-
Hello,
When I was checking the aws docs https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Quotas-Visualize-Alarms.html I realize that it's possible to have service quota metr…
-
Run time on a per node level