Open wmhutchison opened 4 days ago
https://access.redhat.com/support/cases/#/case/03990081 is the active case right now. Based on support feedback, the root cause is the kernel.
The RHEL9 kernel fix: https://access.redhat.com/errata/RHSA-2024:9497 Link for showing OCP releases and specific kernel version: https://access.redhat.com/solutions/7077108
We are waiting for Red Hat to put out an OCP version with the required kernel. Since OCP 4.14.41 dropped on November 20th without the new kernel, we'd be waiting for OCP 4.14.42 to drop with this.
At present we'll likely be continuing on coarse with fixing this in the official OCP 4.16 upgrade, but if this issue worsens, we might need to re-think this and apply an OCP 4.14-latest in SILVER as soon as possible.
Describe the issue This ticket will track effort spent investigating some recent reboot issues in SILVER involving HPE gear and no discernable hardware events causing the reboot.
Additional context Vendor cases:
Related incidents:
Hardware servers but no hardware support tickets since this is not a hardware issue.
How does this benefit the users of our platform? Ensuring we have stable nodes to offer a consistent experience for our users.
Definition of done