Closed dougsland closed 1 year ago
I will investigate this one.
Moving to @mwperina as he started looking this before me.
First, this only affects locally running integration tests. In the GH CI this works fine - I don't know why, unfortunately.
As the error clearly states, the cause for failure is that there is already a container with a specific name (the controller in this case) when trying to start another one. This can have multiple issues. For example, it happened for me that the first hirte-controller
container is started and a subsequent command on that container fails due to "the container not being ready". Then all other tests after it will fail since the pending hirte-controller
is blocking. The "container not ready" error should've been solved by waiting for the condition=running
, for example, but it didn't - not sure if this is a bug in or misuse of the podman python API.
One possible approach to avoid those cascading failures could be to not set a container name - which is fine since we only use the python reference to the containers anyway. There would be a need to properly cleanup those pending containers. I don't know how to tackle the "container not ready" error since it should not happen when waiting for its running condition - well, should. Do you have any idea? @dougsland @mwperina
podman stop control
podman stop node-1
podman stop node-foo
podman stop node-bar
podman container prune
made the trick for now, if appears again I will investigate if we can workaround this.
Describe the bug
Looks like the test scripts must be smart to clean the env before running nodes with the same name or randomly create names.
Looking for error in the logs: