Closed tzumainn closed 2 weeks ago
Looks like we can get rescue images here: https://docs.openstack.org/ironic/latest/install/deploy-ramdisk.html
I'll test this out when I can!
@tzumainn what is the next step for this? Is this still in progress?
yep, still something I'm working on!
running into networking issues; talking to the Ironic folks to see if there's a way around them
The solution for this turns out to be pretty complicated, since the delay in ansible networking means that the node is still on the rescue network by the time it's booted the rescue image, meaning those network interfaces still have the rescue interface IPs. I solved this issue by creating a new rescue image with a custom change to ironic-python-agent that restart the network interfaces a five minute delay (https://github.com/tzumainn/ironic-python-agent/commit/afff59cc281c3579eac4df7f697f0c31e7fc07dc).
The only other change needed is to set default rescue ramdisk and kernels in ironic.conf
.
I still need to formalize this in documentation and updates to esi-pilot
.
Usage documentation: https://github.com/CCI-MOC/esi/pull/563
Updated playbooks: https://github.com/CCI-MOC/esi-pilot/pull/67
Users may corrupt their nodes and want to salvage whatever they can. Ironic has a rescue functionality (https://docs.openstack.org/ironic/latest/admin/rescue.html) that may help with this.