apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 915 forks source link

[Improvement] K8s pod OOM Killed should be identified as Application failed state #6720

Closed Madhukar525722 closed 1 month ago

Madhukar525722 commented 1 month ago

Code of Conduct

Search before asking

What would you like to be improved?

The current behaviour is, when a user engine pod goes into OOMKilled state, it gets into Error operating Launchengine. And even if they try to reconnect a new session, kyuubi connects to same old engine, till the engine timeout and the error persists. This can hinder user experience, who dont have cluster visibility

kyuubi_oom_reconnect kyuubi_pod_oom

How should we improve?

Expected behaviour should be, instead of Application mapping itself to UNKNOWN state, it should be KILLED, which eventually results in application failed, and allows to reconnect for a new session.

Are you willing to submit PR?

github-actions[bot] commented 1 month ago

Hello @Madhukar525722, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.