JonasProgrammer / docker-machine-driver-hetzner

Docker machine driver for the new hetzner cloud API
https://jonasprogrammer.github.io/docker-machine-driver-hetzner/
MIT License
430 stars 52 forks source link

Error creating machine #114

Open julianboehne opened 11 months ago

julianboehne commented 11 months ago

Hello, I used this driver on different autoscaling images like this one. I'm running Docker on my Windows system locally and an error occurs when i try the command:

docker-machine create \
  --driver hetzner \
  --hetzner-api-token=******** \
  --hetzner-server-location=fsn1 \
  --hetzner-image=ubuntu-22.04 \
  --hetzner-server-type=cx11 \
  GitLab-Docker-Machine

The server is starting on the Hetzner Cloud but i get this ssh-error:

Error creating machine: Error running provisioning: Error running "DEBIAN_FRONTEND=noninteractive sudo -E apt-get install -y  curl": ssh command error:
command : DEBIAN_FRONTEND=noninteractive sudo -E apt-get install -y  curl
err     : exit status 255

I also tried to connect to the server inside the docker container and this works well with ssh. What can I do?

JonasProgrammer commented 11 months ago

Hi, the exit status comes from the commands run by docker-machine to provision an already existing server, i.e. the driver has nothing to do with this.

Can you try runnning DEBIAN_FRONTEND=noninteractive sudo -E apt-get install -y curl directly on the working SQL connection? Perhaps you'll get more error output then.

julianboehne commented 11 months ago

Alright, I tested the command directly with ssh and it works well. I searched this command in the docker-machine repo and I found it there. But why does this command works well using ssh and failed with the docker-machine. When I try to debug in docker with docker-machine -D, I get following new error:

About to run SSH command:

                if ! grep -xq '.*\sGitLab-Docker-Machine' /etc/hosts; then
                        if grep -xq '127.0.1.1\s.*' /etc/hosts; then
                                sudo sed -i 's/^127.0.1.1\s.*/127.0.1.1 GitLab-Docker-Machine/g' /etc/hosts;
                        else 
                                echo '127.0.1.1 GitLab-Docker-Machine' | sudo tee -a /etc/hosts; 
                        fi
                fi
SSH cmd err, output: <nil>:
JonasProgrammer commented 11 months ago

Very strange indeed.

Can you perhaps try docker-machine ssh after the server was created? You don't actually need to wait for the provisioning process to fail, just don't kill it right after creation (waiting for the next output after 'Waiting for the server to come up' should be fine). Despite not being provisioned, the machine's access credentials etc. should be available at this stage, so docker-machine ssh should work (fingers crossed).

If the command were to run successful even then, I'm out of ideas right now. I have seen a fair share of driver-related stuff, but heisenbugs in regards to seemingly simple shell commands are a first...

julianboehne commented 11 months ago

No more ideas,

I stopped the creating process before the Detecting the provisioner... step. I tried the docker-machine ssh command and everything works fine. Thanks for the great help, but at the moment I don't have any new ideas to fix this issue.

mrjackv commented 11 months ago

@julianboehne I've encountered the same issue yesterday as well and figured out the problem It's a combination of:

That means that every time docker-machine tries to issue a command via ssh there's a good chance it'll be dropped I've "solved" the problem by using the following cloud-init:

#cloud-config
package_update: true
packages:
  - fail2ban
bootcmd:
  # Temporarely disable ssh, otherwise docker-machine will try to install
  # its stuff before we're done running the cloud-init
  - systemctl disable --now ssh.service
write_files:
- path: /etc/fail2ban/jail.local
  content: |
    [sshd]
    enabled = true
    mode = aggressive
- path: /etc/ssh/sshd_config.d/custom.conf
  content: |
    MaxStartups 300:30:1000
    PasswordAuthentication no
runcmd:
  - systemctl enable --now fail2ban.service
  - systemctl enable --now ssh.service
julianboehne commented 11 months ago

@mrjackv where did I find the cloud-init or where do I need to create it?

mrjackv commented 11 months ago

You need to save the contents to a file and then use the command line option --hetzner-user-data-file=<path to file> when running docker-machine create

julianboehne commented 11 months ago

I tried it like this:

docker-machine create \
  --driver hetzner \
  --hetzner-user-data-file=usr/cloud/cloud-init \
  --hetzner-api-token=*********** \
  --hetzner-server-location=fsn1 \
  --hetzner-image=ubuntu-22.04 \
  --hetzner-server-type=cx11 \
  GitLab-Docker-Machine

But it didn't work. Did I understand something wrong?

JonasProgrammer commented 11 months ago

Thanks for the observation @mrjackv. Despite maintaining the driver I run mostly on Hetzner metal, so I don't always have an insight as to what is currently happening in the cloud world. Therefore having some input of the 'front users' is invaluable.

@julianboehne Did you get any kind of error message or did it just exhibit the same behavior as initially described in this issue?

julianboehne commented 10 months ago

After trying to create the docker-machine, I could observe following error by using docker-machine ls:

NAME                    ACTIVE   DRIVER    STATE     URL                       SWARM   DOCKER    ERRORS
GitLab-Docker-Machine   -        hetzner   Running   tcp://49.13****:2376           Unknown   Unable to query docker version: Cannot connect to the docker engine endpoint

Additionally some informations:

JonasProgrammer commented 10 months ago

Can you try and SSH into the machine after creation to see, whether the docker daemon is actually running (systemctl status docker.service) or something alike?

Other than that, do you have a firewall configured (either on the machine itself or via Hetzner)? Could you also post the output of systemctl cat docker.service and systemctl cat docker.socket?

julianboehne commented 10 months ago

Docker is not running on the Hetzner Server I created. To my knowledge, I have not configured any firewalls.

root@GitLab-Docker-Machine:~# systemctl status docker.service
Unit docker.service could not be found.
root@GitLab-Docker-Machine:~# systemctl cat docker.service
No files found for docker.service.
root@GitLab-Docker-Machine:~# systemctl cat docker.socket
No files found for docker.socket.
gschafra commented 7 months ago

Debug output when doing docker-machine -D regenerate-certs GitLab-Docker-Machine -f:

package: action=install name=curl
(GitLab-Docker-Machine) Calling .GetSSHHostname
(GitLab-Docker-Machine) Calling .GetSSHPort
(GitLab-Docker-Machine) Calling .GetSSHKeyPath
(GitLab-Docker-Machine) Calling .GetSSHKeyPath
(GitLab-Docker-Machine) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /root/.docker/machine/machines/GitLab-Docker-Machine/id_rsa (-rw-------)
&{[-F /dev/null -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none -o LogLevel=quiet -o PasswordAuthentication=no -o ServerAliveInterval=60 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@XXX.XXX.XXX.XXX -o IdentitiesOnly=yes -i /root/.docker/machine/machines/GitLab-Docker-Machine/id_rsa -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
DEBIAN_FRONTEND=noninteractive sudo -E apt-get install -y  curl
SSH cmd err, output: exit status 255: 
Error running "DEBIAN_FRONTEND=noninteractive sudo -E apt-get install -y  curl": ssh command error:
command : DEBIAN_FRONTEND=noninteractive sudo -E apt-get install -y  curl
err     : exit status 255
output  :

Doing the corresponding ssh command directly in the terminal works like a charm