Closed carolynvs closed 4 years ago
@carolynvs I confirm we have recurrent random timeouts on our tests since we use v0.8.0-beta1
.
@carolynvs thanks for the fix. I still want to know why previous condition(WaitConditionNextExit) will make ContainerWait
hang, since I don't see any blocking code in ContainerWait
method. It makes me confused.
Both Porter's and Docker's CI experienced this. I don't want to spend any additional time on this issue, but it was quite clear that this line of code was causing go routines to not complete. I can flip this PR (or the original referenced) on/off and reproduce the regression on demand. On Porter's build server we were seeing go routines live however long our test timeout would allow them and when killed the routines that were being killed were the stdout copy from a bit higher up in that function and this container wait line.
Seems like this WaitConditionNextExit
was not the correct condition to use when the container was started before we began to wait for the container. So in some cases the condition that we were waiting for was already reached by the time we began to wait with that statement.
If you want to dig in further, feel free to do so but after a week of troubleshooting, I'm beat and just happy to be able to move on. 😅
There was a regression introduced in #169 that I found when upgrading Porter to v0.8.0-beta1. This doesn't reproduce on every machine but it solidly reproduces every time Porter's build server unfortunately which has blocked me for a week tracking it down while we tried to upgrade to go mods and v0.8.0-beta1. 😅
When the docker driver calls
ContainerWait
it doesn't always return now, causing that line of code to never exit and eventually hit our test timeouts.I am not sure on the motivation behind the change in #169. We had never seen the bug that it described, long running bundles were waiting just fine.
I have found two fixes for the regression that unblock Porter from upgrading to v0.8.0-beta1:
ContainerWait
andContainerStart
.