Closed spisarski closed 6 years ago
Confirmed that with 16.04.5, "-s" option does not restart nodes automatically in lab2 (HP DL G9), and I had to manually reboot the servers via iLO.
16.04.5 ISO sometimes reboots properly and/or takes an inordinately long period of time to restart.
Reverting back to .4 appears to work; however, this bug affects any approach to automation as the .4 image is no longer available from the Ubuntu download site.
Aricent could not reproduce the issue for 16.04.5 (on Dell servers), CableLabs has seen this issue on HP servers. Aricent will try to use CableLabs lab1 compute nodes to reproduce this issue.
CableLabs did not have this reboot issue with 16.04.5 in lab3 (Dell PowerEdge servers) either, so this appears to be isolated to HP servers.
Aricent was able to access lab1 and reproduce it. Suspect either one power supply is disconnected or due to using legacy bios mode. Bo will try to collect ilo logs by manually rebooting the server.
Additional experiment showed that the ubuntu start took a long time after post (see attached screenshot.)
No obvious errors from post or log files. Need to dig more into this.
From the logs it seems that the boot process stuck for while trying to bring up eno1 interface, here it waits for more then 5 minutes. problem seems to happen because of the cloud-init.
eno1 is not sending DHCP request to the server, though cloud init is configured to do so.
Issue #155 should fix this.
We will check and confirm the same
I confirmed that applying work-around for Issue #155 (i.e., disabling cloud-init) did the trick by reducing the boot-up time for lab1-compute1; however, any explanation why we did not see the issue with 16.04.4?
During today's sync-up meeting, disabling cloud-init would stop ssh key from being properly set-up for both vm and bare metal. We need to reconsider the approach, therefore the previously closed #155 is put back under review.
We are able to setup ssh keys even after removing 50-cloud-init.cfg responsible for network configuration. 50-cloud-init.cfg file is generated by cloud-init by default for network configuration. To disable cloud-init's network configuration capabilities, we need to write a file /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg : network: {config: disabled}
Bo will validate this in lab1.
Will fix in #155 in place, the server reboot is now normal for 16.04.5.
As the 16.04.4 is no longer available from the Ubuntu download site, we need to be sure that 16.04.5 still operates as designed. I have observed issues when using .5 after iaas_launch.py -s on 3 of 4 servers; however, I did not have time to lockdown the issue at hand so this task should help expose any potential issues