JonasProgrammer / docker-machine-driver-hetzner

Docker machine driver for the new hetzner cloud API
https://jonasprogrammer.github.io/docker-machine-driver-hetzner/
MIT License
431 stars 53 forks source link

Machine status can not be checked using drivers.MustBeRunning: unexpected EOF #47

Closed stearz closed 4 years ago

stearz commented 4 years ago

Hi,

first of all: Thank you for your work.

I am getting this error after the machines have been created and docker has already been installed via SSH:

2020/08/21 14:34:46 [INFO] [node-controller-rancher-machine] Installing Docker...
2020/08/21 14:35:21 [INFO] [node-controller-rancher-machine] Copying certs to the local machine directory...
2020/08/21 14:35:24 [INFO] [node-controller-rancher-machine] Copying certs to the remote machine...
2020/08/21 14:38:55 [INFO] [node-controller-rancher-machine] The default lines below are for a sh/bash shell, you can specify the shell you're using, with the --shell flag.
2020/08/21 14:38:55 [INFO] [node-controller-rancher-machine]
2020/08/21 14:38:55 [INFO] Generating and uploading node config master-1
2020/08/21 14:38:55 [ERROR] NodeController c-27sc8/m-njrrn [node-controller] failed with : Error creating machine: Error running provisioning: could not execute drivers.MustBeRunning: could not get server by ID: Get "https://api.hetzner.cloud/v1/servers/7281345": unexpected EOF
2020-08-21 14:39:03.661273 I | mvcc: store.index: compact 2857
2020-08-21 14:39:03.664539 I | mvcc: finished scheduled compaction at 2857 (took 1.6396ms)
Then the machines get destroyed and created again with the same error.

Here some words about my environment:

Rancher version v2.3.5 running with docker run [...] on MacOS. Used Hetzner machine driver: docker-machine-driver-hetzner_2.1.0_linux_amd64.tar.gz (I tried using the darwin driver as well but it did not install into Rancher) Used OS on Hetzner machine: Ubuntu 18.04

I can easily access https://api.hetzner.cloud/v1/servers/7281345 in Postman

JonasProgrammer commented 4 years ago

Hi,

that is a very strange error indeed. My best guess is, that for some reason the HTTP connection to the hetzner cloud API is closed during that GET request, the error being EOF at least indicates this. But I agree, that this is rather unusual, as I have never seen the API simply drop connections, rather than creating a proper success or failure response.

What trips me off here is the fact, that the 'hard' part of provisioning seems to run flawlessly. Could you please try and verify that you can manually run docker-machine create and perhaps something that interacts with the API, such as docker-machine ssh, docker-machine status and docker-machine stop? If that works, the root cause of the issue has gotta be something else.

As a side node, I have absolutely no experience with ranger, but Max is familiar with it -- I just don't want to unnecessarily ping him, if the issue is reproducible without it.

stearz commented 4 years ago

Hi,

I just tried to execute docker-machine create on one of the newly created machines but there is no docker-machine binary installed on the machine. If I run docker ps I get the error that docker is not able to communicate with the docker daemon through the unix socket. When running ps aux I see a running containerd though.

Any suggestions what I can do next?

JonasProgrammer commented 4 years ago

Sorry, there seems to be some confusion: I wanted you to try and run docker-machine create from your own machine, rather than going trough rancher. The idea is to just verify that the error comes from this driver indeed and is not the result of some higher-level problem. Going on, I'd still ask you to please try and go through the lifecycle of creating, interacting with and destroying a Hetzner Cloud machine with docker-machine only. While the error string does seem to come from the driver binary, I'm still not yet sure whether this is a bug in the driver itself or at another level of the stack.

As for the other symptoms you described, as long as containerd and dockerd are up and running, provisioning itself seems to work. In your case, either your user on the machine was not in the docker group (or whatever group was given +w on the socket) or the daemon is not configured to expose the socket -- something I have not seen in the wild yet, but it is entirely possible to do.

stearz commented 4 years ago

Thank you for pointing me into the right direction. When I ran docker-machine create --driver hetzner [...] I had the same issue.

I was able to find the root cause for this error on my local machine: The Antivirus software blocks the request (or it's answer?) every time I try to create the machine with docker-machine and the Hetzner driver. After disabling the so called "web protection" everything works fine: With docker-machine as well as from the Rancher container.

Thanks for helping me to get this working. I close this issue now.