avocado-framework / avocado

Avocado is a set of tools and libraries to help with automated testing. One can call it a test framework with benefits. Native tests are written in Python and they follow the unittest pattern, but any executable can serve as a test.
https://avocado-framework.github.io/
Other
345 stars 343 forks source link

Add exception handling for OSError that my be thrown by os.fsync #6019

Closed smitterl closed 1 month ago

smitterl commented 2 months ago

We've seen instances where "Bad file descriptor" OSError was raised. This looks similar to 52d69bed34982ba36b535c70293ea686fedb1898. Handle in the same way by ignoring the error and continuing execution.

smitterl commented 2 months ago

Sample error trace:

2024-09-03 07:26:13,865 stacktrace       L0041 ERROR| Reproduced traceback from: /var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/core/test.py:778
2024-09-03 07:26:13,866 stacktrace       L0045 ERROR| Traceback (most recent call last):
2024-09-03 07:26:13,866 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/avocado_vt/test.py", line 175, in setUp
2024-09-03 07:26:13,866 stacktrace       L0045 ERROR|     if utils_package.package_install("tar"):
2024-09-03 07:26:13,866 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/virttest/utils_package.py", line 237, in package_install
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     return utils_misc.wait_for(mgr.install, timeout)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/virttest/utils_misc.py", line 581, in wait_for
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     output = func()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/virttest/utils_package.py", line 190, in install
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self.func = super(LocalPackageMgr, self).__getattr__("install")
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/manager.py", line 42, in __getattr__
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._init_on_demand()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/manager.py", line 38, in _init_on_demand
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self.backend = backend()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/backends/dnf.py", line 21, in __init__
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     super(DnfBackend, self).__init__(cmd='dnf')
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/backends/yum.py", line 45, in __init__
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._set_version(cmd)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/software_manager/backends/yum.py", line 74, in _set_version
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     result = process.run(self.base_command + '--version',
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 1090, in run
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     cmd_result = sp.run(timeout=timeout)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 962, in run
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self.wait(timeout, sig)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 898, in wait
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._fill_results(rc)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 742, in _fill_results
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._fill_streams()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 754, in _fill_streams
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     self._stderr_drainer.flush()
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|   File "/var/ci/libvirt-ci/runtest/avocado-vt/avocado/avocado/utils/process.py", line 521, in flush
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR|     os.fsync(fileno)
2024-09-03 07:26:13,867 stacktrace       L0045 ERROR| OSError: [Errno 9] Bad file descriptor
richtja commented 2 months ago

Hi @smitterl thank you for opening this PR. Even though your change would fix the error message, I don't think we should do that, because IMO it hides the real issue. From your error trace sample, it looks like there might be some issue in avocado.utils.software_manager or in avocado-vt.virttest.utils_package, because this error implicates that the process wasn't terminated properly. Is it possible to provide some reproducer for this? I would like to investigate it before we will continue with this PR. Thank you.

smitterl commented 2 months ago

Hi @smitterl thank you for opening this PR. Even though your change would fix the error message, I don't think we should do that, because IMO it hides the real issue. From your error trace sample, it looks like there might be some issue in avocado.utils.software_manager or in avocado-vt.virttest.utils_package, because this error implicates that the process wasn't terminated properly. Is it possible to provide some reproducer for this? I would like to investigate it before we will continue with this PR. Thank you.

This only happens some times in our environments. When it happens, from what I can tell, it's after the test fails and avocado(-vt?) is tearing everything down. You could try and run some of the tp-libvirt tests several times

# avocado run --vt-type libvirt --test-runner=runner --vt-machine-type q35 virsh.migrate_network.ovs_br.without_postcopy --vt-connect-uri qemu:///system
# avocado run --vt-type libvirt --test-runner=runner --vt-machine-type s390-virtio migration.destructive_operations_around_live_migration.migration_kill_libvirt_daemon.autoconverge_option.kill_dest_virtproxyd.with_precopy.p2p --vt-connect-uri qemu:///system

I'll send you a full test log via email.

smitterl commented 1 month ago

Closing in favor of https://github.com/avocado-framework/avocado-vt/pull/4001