Closed chloerh closed 1 month ago
@smitterl Can you help check this from s390x perspective?
Run arm acceptance test, got no failure. @chunfuwen @smitterl anything on s390?
IIUC this is trying to handle a very specific UEFI boot up screen that is only shown on first boot resp. when there's no nvram file. If so, in https://github.com/avocado-framework/avocado-vt/pull/3849 the proposed solution is to handle that file. Would this also be helpful here? @chloerh It is used to avoid the system reset in these tests: https://github.com/autotest/tp-libvirt/pull/5465
@smitterl Can you help check this from s390x perspective?
From s390x perspective: The reset string won't be shown, so I am worried that taking away the is_responsive
part will have a negative impact. What do you think?
IIUC this is trying to handle a very specific UEFI boot up screen that is only shown on first boot resp. when there's no nvram file. If so, in #3849 the proposed solution is to handle that file. Would this also be helpful here? @chloerh It is used to avoid the system reset in these tests: autotest/tp-libvirt#5465
I found it could be helpful to the problem I'm trying to fix with the current pr. The problem is what's being described in the comment below. I also think we need to have a discussion with the team before we make the change to sync()
function.
Thank you for bringing this up, I believe this could solve from the root.
# Backup the EFI vars file before sync to prevent reset/reboot issue
# Updating a VM's XML using vmxml.sync() deletes the nvram file, which on a UEFI VM
# causes an unwanted reset/reboot.
Moreover, xml.sync() is commonly and widely used in many job features, such as virtual disk , detach and attach related cases. From 9.5 rhel jobs test results(all cases pass in multiple round), I can not conclude there is reset screen appearance. So I wonder whether it is specific to VM xml ,for example having some special device in VM. We may need dig more details about this
Therefore, since this is specific issue, so I am still concerned about necessity of this PR change. It looks like we overturn previous general solution in order to resolve specific issue
Additionally, Can we consider to address this issue in specific tp-libvirt file? such as https://github.com/autotest/tp-libvirt/pull/5465.
If reset/reboot happen in some specific cases, we need reconsider whether the test case manipulate VM life cycle in correct way.
Additionally, Can we consider to address this issue in specific tp-libvirt file? such as autotest/tp-libvirt#5465.
If reset/reboot happen in some specific cases, we need reconsider whether the test case manipulate VM life cycle in correct way.
Actually, it's not just specific cases, it happens to lots of network cases when we need to use wait_for_serial_login()
For test results: I have got arm acceptance tests with the help of Yingshun and no negative affect on some s390 tests according to Sebas' comments. I also have run network job of a few hundres cases with this pr too.
About that you didn't find the reset screen, you can only encounter this issue when using wait_for_serial_login()
vmxml.sync() support pass in "--keep-nvram", did we ever try to use vmxml.sync("--keep-nvram") ? In this way, nvram file can be kept.
vmxml.sync() support pass in "--keep-nvram", did we ever try to use vmxml.sync("--keep-nvram") ? In this way, nvram file can be kept.
Yes, we're going to set --keep-nvram as default option, which is another issue to be discussed.
Why this issue appear to our testing? Most possibility is that we rudely virsh undefine --nram firstly, then define VM wth the same image. This is not correct way from user's perspective. In this situation, the forward solution is to correct the way we define and undefine Vm, rather that use workaround solution to hide the incorrect usage way.
Althrough after we use --keep-nvram in sync(), we will not happen to "reset" message in most cases when booting the guest, current PR still can solve some specific cases when --nvram is used which leads to "reset" message and this PR will not impact --keep-nvram cases. So I will merge it and see if any improvement for cases' stability.
To skip the UEFI guest reset process and prevent the forever waiting
Test with previously failed cases: