JulianLegler / oxn

oxn helps you specify and execute observability experiments
Other
1 stars 1 forks source link

Error during collection of response variables #17

Closed JulianLegler closed 1 month ago

JulianLegler commented 1 month ago

Currently it seems like there is an error at the end of the benchmark when the response variables are collected. At one point, the result data from jaeger seems to be empty and there seems to be a problem with prometheus too.

Here are some logs:

[2024-09-11 16:18:38,157] Debian-bookworm-latest-amd64-base/WARNING/urllib3.connectionpool: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe3742f5bd0>: Failed to establish a new connection: [Errno 111] Connection refused')': /jaeger/ui/api/traces?start=1726063591881161&end=1726064312045956&service=recommendationservice&limit=100000
[2024-09-11 16:18:38,162] Debian-bookworm-latest-amd64-base/INFO/oxn.observer: failed to capture recommendation_traces, proceeding. Error while talking to Jaeger at http://10.0.0.2:8080/jaeger/ui/api/traces: HTTPConnectionPool(host='10.0.0.2', port=8080): Max retries exceeded with url: /jaeger/ui/api/traces?start=1726063591881161&end=1726064312045956&service=recommendationservice&limit=100000 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe3743102d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
[2024-09-11 16:18:38,172] Debian-bookworm-latest-amd64-base/INFO/oxn.observer: failed to capture system_CPU, proceeding. Cannot create dataframe from empty Prometheus response: list index out of range

Also, it seems like the jaeger service sometimes dies and respawns when accessing it.

JulianLegler commented 1 month ago

Seems like the pod for jaeger is exiting with "OOMKilled". A solution is then probably to increase the memory limits and check if its enough.