Open omerlh opened 1 year ago
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
We suspect this might be related to Karpenter removing a node when the runner pod is running on, and lack of PDB does not block this operation. I saw the chart does not have one, maybe worth adding?
Checks
Controller Version
v0.27.0
Helm Chart Version
0.22.0
CertManager Version
v1.9.1
Deployment Method
Helm
cert-manager installation
Using the official helm chart - https://charts.jetstack.io
Checks
Resource Definitions
Describe the bug
Recently we noticed job failing without any clear indication about the why:
This is the hang, it usually fails after job timeout time. There are no logs anywhere, so I have no idea why it is failing
Describe the expected behavior
Job runs without failures and not hanging
Whole Controller Logs
Whole Runner Pod Logs
Additional Context
No response