canonical / seldon-core-operator

Seldon Core Operator
Apache License 2.0
5 stars 10 forks source link

`test_seldon_alert_rules` test case is failing, potential race condition #244

Closed DnPlas closed 7 months ago

DnPlas commented 7 months ago

Bug Description

The test case is failing with the following message:

   File "/home/runner/work/seldon-core-operator/seldon-core-operator/tests/integration/test_charm.py", line 204, in test_seldon_alert_rules
    assert up_query_response["data"]["result"][0]["value"][1] == "1"
IndexError: list index out of range

which means that the up_query_response is either empty or missing data/values.

This issue started happening after 1d1a6f5 introduced a new assertion to ensure the up metric is not firing any alerts.

This issue is affecting main and track/1.17

To Reproduce

I was only able to reproduce it in the CI

Environment

on_push CI

Relevant Log Output

Latest CI run

syncronize-issues-to-jira[bot] commented 7 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5374.

This message was autogenerated

DnPlas commented 7 months ago

There is a potential race condition between the time it takes for the test case to run the assertion and when prometheus charm has scraped metrics from seldon-controller-manager. In other charms, we have placed retry logic to allow some time to prometheus to scrape metrics and have them available in the prometheus endpoint. #243 is attempting to fix this issue by adding a retry.