I'm going to ask for some of the other behave test users I know of to try this out since it's quite a significant change. But it seems to be critical in getting GitHub Actions CI working again for our images: an individual test that blocks will now fail without causing the whole test run to be aborted.
I've pushed this to my fork's v1 branch too, to make it easier to try out.
(commit message follows)
Workaround docker.APIClient.exec_start sometimes blocking indefinitely by running in a sub-process and throwing an exception if the sub-process does not complete within a given timeout.
Remove the existing post-exec code which polled the value of docker.APIClient.exec_inspect for 15 seconds to determine if the command had completed. This is effectively performed by the new sub-process waiting. I've set the timeout to 30 seconds, up from 15, which (from experimentation) seems to be necessary to account for the extra time it takes to invoke exec_start within the timeout period.
A future change should make this timeout configurable.
This general pattern (of watchdogging the docker library code) might be useful elsewhere, in particular for any future efforts to support parallel test execution.
I'm going to ask for some of the other behave test users I know of to try this out since it's quite a significant change. But it seems to be critical in getting GitHub Actions CI working again for our images: an individual test that blocks will now fail without causing the whole test run to be aborted.
I've pushed this to my fork's
v1
branch too, to make it easier to try out.(commit message follows)
Workaround
docker.APIClient.exec_start
sometimes blocking indefinitely by running in a sub-process and throwing an exception if the sub-process does not complete within a given timeout.Remove the existing post-exec code which polled the value of
docker.APIClient.exec_inspect
for 15 seconds to determine if the command had completed. This is effectively performed by the new sub-process waiting. I've set the timeout to 30 seconds, up from 15, which (from experimentation) seems to be necessary to account for the extra time it takes to invokeexec_start
within the timeout period.A future change should make this timeout configurable.
This general pattern (of watchdogging the docker library code) might be useful elsewhere, in particular for any future efforts to support parallel test execution.