avocado-framework / avocado-vt

Avocado VT Plugin
https://avocado-vt.readthedocs.org/
Other
83 stars 241 forks source link

vt_console:Skip waiting screen when login via console #3948

Closed chloerh closed 1 month ago

chloerh commented 1 month ago

To skip the UEFI guest reset process and prevent the forever waiting

Test with previously failed cases:

 (1/2) type_specific.io-github-autotest-libvirt.virtual_network.passt.connectivity_between_2vms.ip_portfw.root_user: PASS (202.79 s)
 (2/2) type_specific.io-github-autotest-libvirt.virtual_network.passt.connectivity_between_2vms.ip_portfw.non_root_user: PASS (67.56 s)
RESULTS    : PASS 2 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
 (1/6) type_specific.io-github-autotest-libvirt.virtual_devices.filesystem_device_unprivileged.one_fs.with_hugepages.two_guests.coldplug: PASS (86.80 s)
 (2/6) type_specific.io-github-autotest-libvirt.virtual_devices.filesystem_device_unprivileged.one_fs.with_memfd.two_guests.coldplug: PASS (83.90 s)
 (3/6) type_specific.io-github-autotest-libvirt.virtual_devices.filesystem_device_unprivileged.one_fs.with_shm.two_guests.coldplug: PASS (88.41 s)
 (4/6) type_specific.io-github-autotest-libvirt.virtual_devices.filesystem_device_unprivileged.two_fs.with_hugepages.two_guests.coldplug: PASS (92.72 s)
 (5/6) type_specific.io-github-autotest-libvirt.virtual_devices.filesystem_device_unprivileged.two_fs.with_memfd.two_guests.coldplug: PASS (92.89 s)
 (6/6) type_specific.io-github-autotest-libvirt.virtual_devices.filesystem_device_unprivileged.two_fs.with_shm.two_guests.coldplug: PASS (90.67 s)
RESULTS    : PASS 6 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
chunfuwen commented 1 month ago

@smitterl Can you help check this from s390x perspective?

chloerh commented 1 month ago

Run arm acceptance test, got no failure. @chunfuwen @smitterl anything on s390?

smitterl commented 1 month ago

IIUC this is trying to handle a very specific UEFI boot up screen that is only shown on first boot resp. when there's no nvram file. If so, in https://github.com/avocado-framework/avocado-vt/pull/3849 the proposed solution is to handle that file. Would this also be helpful here? @chloerh It is used to avoid the system reset in these tests: https://github.com/autotest/tp-libvirt/pull/5465

smitterl commented 1 month ago

@smitterl Can you help check this from s390x perspective?

From s390x perspective: The reset string won't be shown, so I am worried that taking away the is_responsive part will have a negative impact. What do you think?

chloerh commented 1 month ago

IIUC this is trying to handle a very specific UEFI boot up screen that is only shown on first boot resp. when there's no nvram file. If so, in #3849 the proposed solution is to handle that file. Would this also be helpful here? @chloerh It is used to avoid the system reset in these tests: autotest/tp-libvirt#5465

I found it could be helpful to the problem I'm trying to fix with the current pr. The problem is what's being described in the comment below. I also think we need to have a discussion with the team before we make the change to sync() function.

Thank you for bringing this up, I believe this could solve from the root.


        # Backup the EFI vars file before sync to prevent reset/reboot issue
        # Updating a VM's XML using vmxml.sync() deletes the nvram file, which on a UEFI VM
        # causes an unwanted reset/reboot.
chunfuwen commented 1 month ago

Moreover, xml.sync() is commonly and widely used in many job features, such as virtual disk , detach and attach related cases. From 9.5 rhel jobs test results(all cases pass in multiple round), I can not conclude there is reset screen appearance. So I wonder whether it is specific to VM xml ,for example having some special device in VM. We may need dig more details about this

Therefore, since this is specific issue, so I am still concerned about necessity of this PR change. It looks like we overturn previous general solution in order to resolve specific issue

chunfuwen commented 1 month ago

Additionally, Can we consider to address this issue in specific tp-libvirt file? such as https://github.com/autotest/tp-libvirt/pull/5465.

If reset/reboot happen in some specific cases, we need reconsider whether the test case manipulate VM life cycle in correct way.

chloerh commented 1 month ago

Additionally, Can we consider to address this issue in specific tp-libvirt file? such as autotest/tp-libvirt#5465.

If reset/reboot happen in some specific cases, we need reconsider whether the test case manipulate VM life cycle in correct way.

chunfuwen commented 1 month ago

vmxml.sync() support pass in "--keep-nvram", did we ever try to use vmxml.sync("--keep-nvram") ? In this way, nvram file can be kept.

chloerh commented 1 month ago

vmxml.sync() support pass in "--keep-nvram", did we ever try to use vmxml.sync("--keep-nvram") ? In this way, nvram file can be kept.

Yes, we're going to set --keep-nvram as default option, which is another issue to be discussed.

chunfuwen commented 1 month ago

Why this issue appear to our testing? Most possibility is that we rudely virsh undefine --nram firstly, then define VM wth the same image. This is not correct way from user's perspective. In this situation, the forward solution is to correct the way we define and undefine Vm, rather that use workaround solution to hide the incorrect usage way.

dzhengfy commented 1 month ago

Althrough after we use --keep-nvram in sync(), we will not happen to "reset" message in most cases when booting the guest, current PR still can solve some specific cases when --nvram is used which leads to "reset" message and this PR will not impact --keep-nvram cases. So I will merge it and see if any improvement for cases' stability.