docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
757 stars 86 forks source link

Docker --cpuset-mems=1 not work - Cannot bind numanode1 #863

Open peiniliu opened 4 years ago

peiniliu commented 4 years ago

NUMA NODE

[xpliu@nxt2025 ~]$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
node 0 size: 130365 MB
node 0 free: 98866 MB
node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 1 size: 131072 MB
node 1 free: 100554 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 

Expected behavior

When running docker on Numa node 0, the output seems well bindings :)

[xpliu@nxt2025 exec-nas]$ docker run -d -t --privileged=true --cpus=8 --cpuset-cpus=0-7 --cpuset-mems=0 --hostname=de2-CPUSETNUMA-1 --name de2-CPUSETNUMA-1 mpinasrdma /bin/bash
[xpliu@nxt2025 ~]$ docker exec -it de2-CPUSETNUMA-1 ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  18224  1892 pts/0    Ss+  17:42   0:00 /bin/bash
root        37  0.0  0.0  65500  1144 ?        Ss   17:42   0:00 /usr/sbin/sshd
root        73  0.0  0.0  34412  1432 pts/1    Rs+  18:03   0:00 ps aux

[xpliu@nxt2025 exec-nas]$ docker exec de2-CPUSETNUMA-1 numastat -p 1
Per-node process memory usage (in MBs) for PID 1 (bash)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.25            0.00            0.25
Stack                        0.03            0.00            0.03
Private                      1.62            0.00            1.62
                   --------------- --------------- ---------------
Total                        1.91            0.00            1.91

I also started an ssh daemon inside this docker, the memory affinity is well bound. :)

[xpliu@nxt2025 exec-nas]$ docker exec de2-CPUSETNUMA-1 numastat -p 37
Per-node process memory usage (in MBs) for PID 37 (sshd)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.06            0.00            0.06
Stack                        0.02            0.00            0.02
Private                      1.12            0.00            1.12
                   --------------- --------------- ---------------
Total                        1.21            0.00            1.21

Actual behavior

When running docker on Numa node 1, the processes inside are not using the right Numa node.

[xpliu@nxt2025 exec-nas]$ docker run -d -t --privileged=true --cpus=8 --cpuset-cpus=18-25 --cpuset-mems=1 --hostname=de2-CPUSETNUMA-2 --name de2-CPUSETNUMA-2 mpinasrdma /bin/bash
[xpliu@nxt2025 ~]$ docker exec -it de2-CPUSETNUMA-2 ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  18224  1896 pts/0    Ss+  17:42   0:00 /bin/bash
root        38  0.0  0.0  65500  1148 ?        Ss   17:42   0:00 /usr/sbin/sshd
root        69  0.0  0.0  34412  1428 pts/1    Rs+  18:04   0:00 ps aux

[xpliu@nxt2025 exec-nas]$ docker exec de2-CPUSETNUMA-2 numastat -p 1
Per-node process memory usage (in MBs) for PID 1 (bash)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.25            0.25
Stack                        0.00            0.03            0.03
Private                      1.44            0.18            1.62
                   --------------- --------------- ---------------
Total                        1.44            0.46            1.91

[xpliu@nxt2025 exec-nas]$ docker exec de2-CPUSETNUMA-2 numastat -p 38
Per-node process memory usage (in MBs) for PID 38 (sshd)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                         0.00            0.00            0.00
Heap                         0.00            0.06            0.06
Stack                        0.00            0.02            0.02
Private                      0.51            0.61            1.12
                     --------------- --------------- ---------------
Total                        0.51            0.70            1.21

Output of docker version:

[xpliu@nxt2025 ~]$ docker version
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:25:41 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:24:18 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 11
  Running: 8
  Paused: 0
  Stopped: 3
 Images: 26
 Server Version: 19.03.5
 Storage Driver: overlay2
  Backing Filesystem: tmpfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-957.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 72
 Total Memory: 251.2GiB
 Name: nxt2025.hpc.eu.lenovo.com
 ID: QOE2:X7W3:CNQU:YEIB:Z6NT:URQQ:OQGJ:JQVF:R4VK:2ENT:GKC3:DIKU
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: peiniliu
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.)

apinheiro commented 2 years ago

I have the same problem. My Host have 80 cores and the Docker proccess run in all cores, even if i set only 2 cores to run.

usstq commented 5 months ago

I guess it's Linux's default behavior (we can use numactl to start a bash instead of using docker and repeat the test, we would get same results), if we check /proc/pid/numa_maps we will find that it's the code section of common shared-library which is using N0 (reasonable).