Closed Luap99 closed 3 days ago
@edsantiago @baude @ashley-cui PTAL
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: edsantiago, Luap99
The full list of commands accepted by this bot can be found here.
The pull request process is described here
LGTM
/lgtm
I think I noticed one weird failure pattern:
$ podman machine init --disk-size 11 --image /private/tmp/ci/podman-machine-daily.aarch64.applehv.raw foo1
[FAILED] Timed out after 240.001s.
...
-> next test
$ podman machine init --disk-size 11 --image /private/tmp/ci/podman-machine-daily.aarch64.applehv.raw f357ac67e822
Error: truncate /private/tmp/ci/podman_test9067091/.local/share/containers/podman/machine/applehv/foo1-arm64.raw: no such file or directory
Machine init complete
To start your machine run:
podman machine start f357ac67e822
-> this one is a success despite the error message?! And notice how the error path contains the machine name from the previous failed test.
I see this pattern in basically all my failed runs here.
My best guess is that was caused by https://github.com/containers/podman/pull/23068. I know we had the flake before but the fact that it got that bad all of the sudden suggest to me that something must have changed that causes this. Looking at the runs there it took 7 tries: https://cirrus-ci.com/task/5748607108775936
I also pushed https://github.com/containers/podman/pull/23162 that should hopefully add useful debug output to find the root cause.
It took 13 tries to get the mac machine test to pass
pkg/machine/e2e: use tmp file for connections
On linux and macos the connections are stored under the home dir by default so it is not a problem there but on windows we first check the APPDATA env and use this dir as config storage. This has the problem that it is not cleaned up after each test as such connections might leak into the following test causing failues there.
Fixes https://github.com/containers/podman/issues/22844
pkg/machine/e2e: fix broken cleanup
Currently all podman machine rm errors in AfterEach were ignored. This means some leaked and caused issues later on, see https://github.com/containers/podman/issues/22844.
To fix it first rework the logic to only remove machines when needed at the place were they are created using DeferCleanup(), however DeferCleanup() does not work well together with AfterEach() as it always run AfterEach() before DeferCleanup(). As AfterEach() deletes the dir the podman machine rm call can not be done afterwards.
As such migrate all cleanup to use DeferCleanup() and while I have to touch this fix the code to remove the per file duplciation and define the setup/cleanup once in the global scope.
Does this PR introduce a user-facing change?