docker / cli

The Docker CLI
Apache License 2.0
4.84k stars 1.91k forks source link

device mount fails under remote context in MacOS with 19.03.5 docker CLI #2280

Open sureshsankaran opened 4 years ago

sureshsankaran commented 4 years ago

BUG REPORT INFORMATION

When i try to do device mount under remote context in MacOS with 19.03.5 docker CLI, i get below error.

$ docker run -d --device=/dev/ttySerial:/dev/ttySerial docker_image
docker: unknown server OS: .
See 'docker run --help'.

Description

Unable to do device mount in 19.03.5 docker cli under remote context. I have setup docker cli to talk to remote docker engine using tcp socket. All the commands work except when running container with some device bind mounted. I didnt see this issue in 18.09.

Steps to reproduce the issue:

  1. export DOCKER_HOST=tcp:/<>
  2. docker run -d --device=<some_device_on_remote_host> docker_image

Describe the results you received:

docker: unknown server OS: .
See 'docker run --help'.

Describe the results you expected: I expected the device on the remote host to be mounted successfully into the container.

Output of docker version:

$ docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.37
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:22:34 2019
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: <---Remote host
 Engine:
  Version:          18.03.0
  API version:      1.37 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       708b068d3095c6a6be939eb2da78c921d2e945e2
  Built:            Fri Dec 13 03:54:50 2019
  OS/Arch:          linux/arm64
  Experimental:     false

Output of docker info:

$ docker info
Client:
 Debug Mode: false

Server:
 Containers: 10
  Running: 7
  Paused: 0
  Stopped: 3
 Images: 1
 Server Version: 18.03.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c.m (expected: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c)
 runc version: 751f18de2af90495e9c5665b95bfc7adf66ddd57-dirty (expected: 4fc53a81fb7c994640722ac585fa9ca548971871)
 init version: N/A (expected: )
 Security Options:
  userns
 Kernel Version: 4.4.206-armada-17.10.1
 Operating System: <unknown> (containerized)
 OSType: linux
 Architecture: aarch64
 CPUs: 2
 Total Memory: 3.851GiB
 Name: remotedevice
 ID: OF5G:OZRO:WY6E:K3E2:J45I:5ZSE:7ZWO:XJPU:7DQX:MOQS:TVBK:NUHV
 Docker Root Dir: /mnt/xyz/docker/1000000.1000000
 Debug Mode: true
  File Descriptors: 59
  Goroutines: 66
  System Time: 2020-01-22T21:53:35.25600644Z
  EventsListeners: 0
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.): Docker cli was run on MacOS with Mojave against remote docker engine on armv8 linux machine.

sureshsankaran commented 4 years ago

Looks like below commit caused this issue. https://github.com/docker/docker-ce/commit/eb973f58a00c48bcde97f61a7903b8d474f6c6c0

https://github.com/docker/docker-ce/blame/master/components/cli/cli/command/container/opts.go#L971

thaJeztah commented 4 years ago

We should probably make it default to linux if no OSType was found (or when talking to an older API version)

The code to detect the OSType didn't change between v18.09 and v19.03; both use the same; https://github.com/docker/cli/blob/v18.09.0/cli/command/cli.go#L212-L230

Curious though why it's not able to get the API version; the OSType header has been there for a while (latest change in that area was in https://github.com/moby/moby/pull/35151, which was well before 18.03)

Would you be able to perform a GET on the /_ping endpoint for your remote daemon?

(e.g. `curl -v 'https:///_ping') ?

This is what's returned from my local Docker for Mac daemon;

curl -v --unix-socket /var/run/docker.sock 'http://localhost/_ping'

*   Trying /var/run/docker.sock:0...
* Connected to localhost (docker.sock) port 80 (#0)
> GET /_ping HTTP/1.1
> Host: localhost
> User-Agent: curl/7.65.3
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Api-Version: 1.40
< Cache-Control: no-cache, no-store, must-revalidate
< Content-Length: 2
< Content-Type: text/plain; charset=utf-8
< Date: Wed, 22 Jan 2020 22:59:30 GMT
< Docker-Experimental: true
< Ostype: linux
< Pragma: no-cache
< Server: Docker/19.03.5 (linux)
<
* Connection #0 to host localhost left intact
OK
sureshsankaran commented 4 years ago

I do see the ostype retrieved correctly!!

$ curl -v -k  https://172.27.180.12:443/v1.37/_ping
*   Trying 172.27.180.12...
...
> GET /v1.37/_ping HTTP/1.1
> Host: 172.27.180.12
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Thu, 23 Jan 2020 05:32:30 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 2
< Connection: keep-alive
< Set-Cookie: 
< Cache-Control: private, no-cache, no-store, must-revalidate
< Pragma: no-cache
< Expires: 0
< Api-Version: 1.37
< Docker-Experimental: false
**< Ostype: linux**
< X-Frame-Options: SAMEORIGIN
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
<
* Connection #0 to host 172.27.180.12 left intact
sureshsankaran commented 4 years ago

I just tried downgrading docker cli to 18.09 on my mac and it works fine with device mount under remote context.

$ docker version Client: Docker Engine - Community Version: 18.09.0 API version: 1.37 Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:47:43 2018 OS/Arch: darwin/amd64 Experimental: false

Server: Engine: Version: 18.03.0 API version: 1.37 (minimum version 1.12) Go version: go1.10.3 Git commit: 708b068d3095c6a6be939eb2da78c921d2e945e2 Built: Fri Dec 13 03:54:50 2019 OS/Arch: linux/arm64 Experimental: false

sureshsankaran commented 4 years ago

I do see the ostype retrieved correctly!!

$ curl -v -k https://172.27.180.12:443/v1.37/_ping

  • Trying 172.27.180.12... ...

GET /v1.37/ping HTTP/1.1 Host: 172.27.180.12 User-Agent: curl/7.54.0 Accept: /_

< HTTP/1.1 200 OK > < Server: nginx < Date: Thu, 23 Jan 2020 05:32:30 GMT < Content-Type: text/plain; charset=utf-8 < Content-Length: 2 < Connection: keep-alive < Set-Cookie: < Cache-Control: private, no-cache, no-store, must-revalidate < Pragma: no-cache < Expires: 0 < Api-Version: 1.37 < Docker-Experimental: false < Ostype: linux < X-Frame-Options: SAMEORIGIN < X-Content-Type-Options: nosniff < X-XSS-Protection: 1; mode=block <

  • Connection #0 to host 172.27.180.12 left intact

Do you think the issue could be due to 19.03 cli not handling properly for cases with nginx reverse proxying the requests to remote docker server, where serverOS is set incorrect ?

thaJeztah commented 4 years ago

One thing I was considering is that 19.03 does a HEAD /_ping first, and if that fails, falls back to GET /_ping (it's possible that that could be related).

Not sure if the nginx reverse proxy makes a difference (but not excluding that possibility). I was about to spin up a machine and install docker 18.03, then try with a 19.03 client

But you could try running a client in a container on your setup;

docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock docker:19.03 sh

That should start a container with a docker 19.03 cli, which has direct access to the API using the socket. wondering if that also reproduces the issue if you try to start the container from within there?

sureshsankaran commented 4 years ago

@thaJeztah I tried same device mount operations as you suggested with running docker cli:19.03 inside the container on the target device with sock volume mounted. I DID NOT see the same device mount issue.

sureshsankaran commented 4 years ago

This issue is not limited to macOS. i hit the same error when i tried with 19.03.5 docker CLI from Ubuntu 16.04

thaJeztah commented 4 years ago

oh, whoops, I wanted to reply but forgot

I tried same device mount operations as you suggested with running docker cli:19.03 inside the container on the target device with sock volume mounted. I DID NOT see the same device mount issue.

OK, that's interesting. That means that the problem occurs over the tcp:// connection, but not when connecting to the socket.

Do you think the issue could be due to 19.03 cli not handling properly for cases with nginx reverse proxying the requests to remote docker server, where serverOS is set incorrect ?

If there's a proxy in between, then that could play a role. I think it should be possible to check that theory; I assume the nginx reverse proxy is connecting with the daemon through tcp:// ? So if the daemon is listening on 0.0.0.0:<some port>, then it should be possible to connect directly with it from within a container on that machine (if the container is running in the host networking namespace)

Similar to the previous test; could you start a container but this time without bind-mounting the socket, but instead running it in the host networking namespace?

docker run -it --rm --network=host docker:19.03 sh

Then, from within that container try to reproduce the issue (using tcp://127.0.0.1:<port on which the daemon listens> to connect to the daemon);

export DOCKER_HOST=tcp://127.0.0.1:<port on which the daemon listens>

# test if connecting works ok
docker version
docker info
docker run -d --device=<some_device_on_remote_host> docker_image

If that works, then the problem looks to be with the reverse proxy

sureshsankaran commented 4 years ago

Thanks @thaJeztah for looking into this issue.

Just tried as you suggested above to try via tcp from 19.03 docker client inside a container (netns=host) on the target device. Device did get mounted successfully for the nested container.

So it is most likely going wrong when reverse proxy is in between.

Wonder what is docker cli 19.03 is looking for differently compared to older version because i dont see this issue in 18.09 or older docker cli with reverse proxy.

sureshsankaran commented 4 years ago

^^ correction - not nested, container will be created by host docker engine only

thaJeztah commented 4 years ago

^^ correction - not nested, container will be created by host docker engine only

Yes, correct; it's connecting to the daemon on the host, so the container that's created is not running "docker in docker"

Wonder what is docker cli 19.03 is looking for differently compared to older version because i dont see this issue in 18.09 or older docker cli with reverse proxy.

My main suspect would be the HEAD /_ping -> fallback to GET /_ping. Perhaps the reverse proxy returns a response that's causing the CLI to not continue with the GET fallback. The original implementation would continue on any non-200 and non-500 response (https://github.com/moby/moby/pull/38570), but https://github.com/moby/moby/pull/39206 made a change to not continue on a connection error.

What response do you get if you try to perform a HEAD request with curl (connecting to the reverse proxy)?

curl -X HEAD -i  https://172.27.180.12:443/_ping

or

curl -i --head https://172.27.180.12:443/_ping
sureshsankaran commented 4 years ago

Thanks a lot @thaJeztah I found the root cause. nginx reverse proxy don't have necessary config to relay the /_ping request to docker server. Once i added that nginx conf, things worked fine with 19.03 docker cli.

Really appreciate your help!!!

sureshsankaran commented 4 years ago

BTW..nginx reverse proxy handled _ping request if it has docker api version in that url. Like v1.37/_ping works but not handling the api without version /_ping

sureshsankaran commented 4 years ago

why not use API version for _ping which will make it easier for reverse proxy to setup relay config with that version, instead of individual URL endpoint

thaJeztah commented 4 years ago

why not use API version for _ping which will make it easier for reverse proxy to setup relay config with that version, instead of individual URL endpoint

When making that request, the client does not yet know what API versions are supported by the daemon. The /_ping endpoint is used to discover the API version; the client calls the /_ping endpoint, based on the response now knows things, such as;

With that information, the client can (if needed) downgrade to an older API version, enable/disable certain features etc.

adityai commented 3 years ago

Interestingly, I have the same error 'docker: unknown server OS' when I executed the following command on a Red Hat Enterprise Linux 7 host os that had a docker version 'Docker version 1.13.1, build 64e9980/1.13.1'. I executed this remotely from my windows 10 laptop that has version 20.10.0.

docker run --publish=18080:8080 -d --restart=always --name=cadvisor --privileged --network=host --device=/dev/kmsg gcr.io/cadvisor/cadvisor

When I execute the same command on a docker server running on my windows 10 laptop with docker version 'Docker version 20.10.0, build 7287ab3', the container runs without any issues.

I logged into the RHEL docker server and executed the same command from shell and it worked fine.

So, perhaps a mismatch in cli version and the remotely hosted version is causing this issue?