Open thecadams opened 1 year ago
Maybe the tunnel is torn down after one usage? Just saw this on the remote side from sshd:
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: input drain -> closed
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: rcvd adjust 9127
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: receive packet: type 97
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: rcvd close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: output open -> drain
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: channel 0: will not send data after close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: obuf empty
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: close_write
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: output drain -> closed
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: send close
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: send packet: type 97
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: is dead
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug2: channel 0: garbage collecting
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug1: channel 0: free: direct-tcpip, nchannels 8
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: debug3: channel 0: status: The following connections are open:\r\n #0 direct-tcpip (t4 r0 i3/0 o3/0 fd 8/8 cc -1)\r\n #1 direct-tcpip (t4 r1 i0/0 o0/0 fd 9/9 cc -1)\r\n #2 direct-tcpip (t4 r2 i0/0 o0/0 fd 10/10 cc -1)\r\n #3 direct-tcpip (t4 r3 i0/0 o0/0 fd 11/11 cc -1)\r\n #4 direct-tcpip (t4 r4 i0/0 o0/0 fd 1
Mar 07 02:50:29 ip-10-1-3-170.ec2.internal sshd[4333]: Connection closed by 50.17.68.142 port 39023
I found that if I removed the redirectStd commands which redirect stdout/stderr of the child process back to the provider, the child process outlives the provider process. I think it is intended by the module, but for some reason does not work.
@Blefish based on what you mentioned, plus the bufio Scanner.Scan()
docs I have a hypothesis:
Scan panics if the split function returns too many empty tokens without advancing the input. This is a common error mode for scanners.
Pretty sure it's talking about this panic.
If this is the case, the parent's stderr would have to not be in the logs, otherwise we'd see the panic. As well as that, it's reasonable for the child to die without anything in the tf logs, since the parent died first.
Thoughts on this?
@Blefish you were right, ignoring the child process stdout and stderr seems to prevent the child process crashing. My fork has the change you described, and that fixes the issue for me. Thanks for the suggestion!
@thecadams thanks for putting up the fork! I've managed getting it to work when executing Terraform locally, but unfortunately Terraform Cloud with remote execution does not work. Terraform Cloud has the same behaviour as you are describing even with your fork installed, the ssh tunnel stops 2 or 3 seconds after it's started.
You can try release v0.2.3
Hi @AndrewChubatiuk, Thanks for this module, hoping to make it work over here!
Looks like the tunnel is closed from the Terraform side, about 1-3 seconds after being opened.
Logs: https://gist.github.com/thecadams/e3dc630cadadc9018946fef98aea26ca Of particular interest in the tf log is this line:
I have a config similar to this:
The
rc_prometheus
module manages 1 grafana folder and several dashboards in that folder:Unfortunately despite the grafana provider getting the correct host and port, I get
connection refused
as it seems the connection shuts down too fast. I also tried usingtime_sleep
resources and provisioners in various places, but nothing worked.Expected Behavior
There should be a way to control when the tunnel closes.
Actual Behavior
Tunnel closes within 1-3 seconds, causing
connection refused
errors in the module.Steps to Reproduce
Something like the config above should repro this.
Important Factoids
Looks like recent changes in this fork removed the "close connection" provider, maybe that should be reinstated to support this use case?
You'll also notice stuff in the logs like this, which is not related, it's because I moved the ssh tunnel out of the module since the previous apply:
References