Job Duration in seconds (by name): Indicates slow queries
Job Success or Failure (by name): Indicates failures
Since these jobs are run periodically as a cronjob we can't serve metrics from here, but rather we would likely have to use a prometheus push gateway.
Open to other ideas for meaningful metrics. But we really just want to capture information that might lead to alerts (i.e. failing jobs or long run times).
Some good first metrics to start with:
Since these jobs are run periodically as a cronjob we can't serve metrics from here, but rather we would likely have to use a prometheus push gateway.
Open to other ideas for meaningful metrics. But we really just want to capture information that might lead to alerts (i.e. failing jobs or long run times).