cockpit-project / cockpit-machines

Cockpit UI for virtual machines
GNU Lesser General Public License v2.1
257 stars 67 forks source link

Debug flakes shutdown #1701

Closed jelly closed 5 days ago

jelly commented 1 week ago

The top flake for Fedora-40, basically all the failures are waiting on the host to shutdown. Let's just bump the timeout and see if it works.

https://cockpit-logs.us-east-1.linodeobjects.com/pull-1693-69328051-20240626-133906-fedora-40/log.html https://cockpit-logs.us-east-1.linodeobjects.com/pull-1697-547cf65d-20240626-154604-fedora-40-firefox/log.html https://cockpit-logs.us-east-1.linodeobjects.com/pull-1697-547cf65d-20240626-154604-fedora-40-firefox/log.html https://cockpit-logs.us-east-1.linodeobjects.com/pull-1679-576c973f-20240616-083805-fedora-40/log.html https://cockpit-logs.us-east-1.linodeobjects.com/pull-1679-576c973f-20240616-083805-fedora-40/log.html https://cockpit-logs.us-east-1.linodeobjects.com/pull-1697-547cf65d-20240626-144920-fedora-40-firefox/log.html https://cockpit-logs.us-east-1.linodeobjects.com/pull-1697-547cf65d-20240626-154604-fedora-40-firefox/log.html

jelly commented 1 week ago

Touching machineslib.py does not make tests re-run 3 times. Lets touch one test file for science.

jelly commented 1 week ago

@martinpitt please take a look, as changing only machineslib.py does not trigger re-tries I still have a debug commit so you can see that the change did in fact make CI less flaky :)

jelly commented 1 week ago

@martinpitt requested some investigation if this is due to too many parallel runs, soon we have --amplify to easily check this

martinpitt commented 1 week ago

@jelly If waiting longer helps, by all means go for it! It's plausible if that is a "soft" shutdown, i.e. the OS actually has to shut down (as opposed to just killing the VM). It's all emulated, so the inner VMs are sloooow :snail:

jelly commented 6 days ago

@jelly If waiting longer helps, by all means go for it! It's plausible if that is a "soft" shutdown, i.e. the OS actually has to shut down (as opposed to just killing the VM). It's all emulated, so the inner VMs are sloooow 🐌

I did some more testing in https://github.com/cockpit-project/cockpit-machines/pull/1705 too much amplification makes the VM crash, like here. So I'd say this is good to merge?

jelly commented 5 days ago

Still fails,this needs a deeper dive:

https://cockpit-logs.us-east-1.linodeobjects.com/pull-1701-72252da6-20240703-090406-debian-stable/TestMachinesLifecycle-testBasic-debian-stable-127.0.0.2-2901-FAIL.png