rework container.execute to use multiprocessing as watchdog

jmtd commented 7 months ago

I'm going to ask for some of the other behave test users I know of to try this out since it's quite a significant change. But it seems to be critical in getting GitHub Actions CI working again for our images: an individual test that blocks will now fail without causing the whole test run to be aborted.

I've pushed this to my fork's v1 branch too, to make it easier to try out.

(commit message follows)

Workaround docker.APIClient.exec_start sometimes blocking indefinitely by running in a sub-process and throwing an exception if the sub-process does not complete within a given timeout.

Remove the existing post-exec code which polled the value of docker.APIClient.exec_inspect for 15 seconds to determine if the command had completed. This is effectively performed by the new sub-process waiting. I've set the timeout to 30 seconds, up from 15, which (from experimentation) seems to be necessary to account for the extra time it takes to invoke exec_start within the timeout period.

A future change should make this timeout configurable.

This general pattern (of watchdogging the docker library code) might be useful elsewhere, in particular for any future efforts to support parallel test execution.

spolti commented 7 months ago

IIRC there is a environment variable that you can set to increase the timeout, BEHAVE_TIMEOUT I guess. @rnc .

rnc commented 4 months ago

@jmtd Is this ready to merge?

jmtd commented 4 months ago

yes. sorry for the delay

cekit / behave-test-steps

rework container.execute to use multiprocessing as watchdog #50