One of my teammates has picked your orb to run integration testing against an internal DB instance, using tunneling through a bastion host. Thanks for publishing BTW, it's pretty useful to trim down the mess that our 500-line .circleci/config.yml is!
However, there's a recurring issue where the next step after dmz/open_tunnel fails with Connection Refused to the tunneled port. Most of the time, of course, it works fine — and my usual interpretation of such symptom is that there is connection-opening race present.
Meaning that ssh -f (I guess!) will listen the local socket very soon, but still after forking; creating a small window of time when the ssh -f has returned control to bash but the socket isn't listened yet. Occasionally, the OS scheduler will starve ssh and resume the bash script instead (which naturally assumes that the tunnel port is already open) — and hit ECONNREFUSED.
I see that in the usage examples, you always add a curl localhost to maybe keep track of the problem. It's also easy to work around such issues by adding explicit polling of the port (effectively synchronizing away the race between ssh and bash).
I'd appreciate if you have any comment on this. Maybe something as simple as adding:
while test 7 -eq $(curl -s localhost:1234; echo $?); do sleep 0.1; done
after the ssh -Nf call in the orb source. What do you think?
Hi Eddie!
One of my teammates has picked your orb to run integration testing against an internal DB instance, using tunneling through a bastion host. Thanks for publishing BTW, it's pretty useful to trim down the mess that our 500-line
.circleci/config.yml
is!However, there's a recurring issue where the next step after
dmz/open_tunnel
fails with Connection Refused to the tunneled port. Most of the time, of course, it works fine — and my usual interpretation of such symptom is that there is connection-opening race present.Meaning that
ssh -f
(I guess!) will listen the local socket very soon, but still after forking; creating a small window of time when thessh -f
has returned control tobash
but the socket isn't listened yet. Occasionally, the OS scheduler will starvessh
and resume thebash
script instead (which naturally assumes that the tunnel port is already open) — and hitECONNREFUSED
.I see that in the usage examples, you always add a
curl localhost
to maybe keep track of the problem. It's also easy to work around such issues by adding explicit polling of the port (effectively synchronizing away the race betweenssh
andbash
).I'd appreciate if you have any comment on this. Maybe something as simple as adding:
after the
ssh -Nf
call in the orb source. What do you think?