canonical / seldon-core-operator

Seldon Core Operator
Apache License 2.0
5 stars 9 forks source link

fix: correct metrics path for MetricsEndpointProvider (#236) #240

Closed DnPlas closed 5 months ago

DnPlas commented 5 months ago

The metrics endpoint configuration had two scrape jobs, one for the regular metrics endpoint, and a second one based on a dynamic list of targets. The latter was causing the prometheus scraper to try and scrape metrics from *:80/metrics, which is not a valid endpoint. This was causing the UnitsUnavailable alert to fire constantly because that job was reporting back that the endpoint was not available. This new job was introduced by canonical/seldon-core-operator#94 with no apparent justification. Because the seldon charm has changed since that PR, and the endpoint it is configuring is not valid, this commit will remove the extra job.

This commit also refactors the MetricsEndpointProvider instantiation and removes the metrics-port config option as this value should not change.

Finally, this commit changes the alert rule interval from 0m to 5m, as this interval is more appropriate for production environments.

Part of canonical/bundle-kubeflow#564

The test_prometheus_grafana_integration test case was doing queries to prometheus and checking the request returned successfully and that the application name and model was listed correctly. To make this test case more accurately, we can add an assertion that also checks that the unit is available, this way we avoid issues like the one described in canonical/bundle-kubeflow#564.

Part of canonical/bundle-kubeflow#564

DnPlas commented 5 months ago

CI is failing because of https://github.com/canonical/bundle-kubeflow/issues/813, #241 should fix it.