Closed 53d117460ec63d70 closed 3 years ago
Anything in the debug logs? (run with launchpad --debug apply
or take a look in the ~/.mirantis-launchpad/cluster/<CLUSTER_NAME>/install.log
.
The previous phase does apt-get install -y -q curl apt-utils socat iputils-ping
but I don't see why that would kill the connection.
Just a thought, could there be some kind of keepalive requirement in the sshd config, this just popped in my head, I think it is possible that launchpad does not send ssh keepalives. A bit far fetched but I guess in theory it could be possible.
with ---debug
it hangs here:
INFO[0228] x.x.x.4: installing engine (19.03.12)
INFO[0228] x.x.x.4: installing engine (19.03.12)
DEBU[0228] x.x.x.5: + sudo -E sh -c 'apt-get update -qq'
DEBU[0228] x.x.x.5: + sudo -E sh -c 'apt-get install -y -qq apt-transport-https ca-certificates curl software-properties-common >/dev/null'
DEBU[0228] x.x.x.5: curl: (22) The requested URL returned error: 404 Not Found
DEBU[0228] x.x.x.5: + sudo -E sh -c 'curl -fsSL https://repos.mirantis.com/ubuntu/gpg | apt-key add -qq - >/dev/null'
DEBU[0228] x.x.x.5: + sudo -E sh -c 'add-apt-repository '\''deb [arch=amd64] https://repos.mirantis.com/ubuntu xenial stable'\'' >/dev/null'
DEBU[0228] x.x.x.5: + sudo -E sh -c 'apt-get update -qq >/dev/null'
DEBU[0228] x.x.x.5: + sudo -E sh -c 'apt-get install -y --allow-downgrades -qq docker-ee=5:19.03.12~3-0~ubuntu-xenial docker-ee-cli=5:19.03.12~3-0~ubuntu-xenial'
INFO[0228] x.x.x.5: installing engine (19.03.12)
INFO[0229] x.x.x.4: installing engine (19.03.12)
The apply log has this message repeated:
time="29 Sep 20 14:33 BST" level=error msg="x.x.x.4: failed to install engine -> All attempts fail:\n#1: wait: remote command exited without exit status or exit signal\n#2: read tcp x.x.x.x:52832->x.x.x.4:22: read: connection timed out\
After this error I can no longer connect to the SSH port (22) on the VM. A tcpdump on the VM shows the SYN packets arriving at the VM but not getting ACKed. I think that some part of the docker-ee installation is configuring iptables or firewall rules to drop these packets.
Do you have some special rules configured to iptables in the images?
No and this communication only breaks during the launchpad apply. It would be great if there was some example code for Azure that we could test as it's most likely that we've missed something required for that cloud provider.
In PR #53 there's terraform config that seem to work on azure, except for the fact that windows machines need to be rebooted after engine install, which is not yet included in any released version of launchpad.
Was this resolved?
I will test with the azure example and open a new ticket if the issue reoccurs. Thanks.
When using launchpad to deploy docker ee onto Ubuntu VMs in Azure public cloud the installation hangs at the following point:
After this it is no longer possible to ssh onto the VM. A packet capture on the VM (via serial console) shows that the SSH TCP SYN packets are not being ACKed. Is the docker ee install adding some firewall or iptable rules that are causing this?