docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
755 stars 86 forks source link

Running container fails with failed to add the host cannot allocate memory #1443

Open hpakniamina opened 2 years ago

hpakniamina commented 2 years ago

OS: Red Hat Enterprise Linux release 8.7 (Ootpa) Version:

$ sudo yum list installed | grep docker
containerd.io.x86_64                         1.6.9-3.1.el8                               @docker-ce-stable
docker-ce.x86_64                             3:20.10.21-3.el8                            @docker-ce-stable
docker-ce-cli.x86_64                         1:20.10.21-3.el8                            @docker-ce-stable
docker-ce-rootless-extras.x86_64             20.10.21-3.el8                              @docker-ce-stable
docker-scan-plugin.x86_64                    0.21.0-3.el8                                @docker-ce-stable

Out of hundreds os docker calls made over days, a few of them fails. This is the schema of the commandline:

/usr/bin/docker run \
-u 1771:1771 \
-a stdout \
-a stderr \
-v /my_path:/data \
--rm \
my_image:latest my_entry --my_args

The failure:

docker: Error response from daemon: failed to create endpoint recursing_aryabhata on network bridge: failed to add the host (veth6ad97f8) <=> sandbox (veth23b66ce) pair interfaces: cannot allocate memory.

It is not easily reproducible. The failure rate is less than one percent. At the time this error happens system has lots of free memory. Around the time that this failure happens, the application is making around 5 docker calls per second. Each call take about 5 to 10 seconds to complete.

pschoen-itsc commented 10 months ago

Setting the nr_cpu boot parameter resolved the issue for us permanently.

JonasAlfredsson commented 10 months ago

Same goes for us, after doing the steps from my comment above, with nr_cpu set to the threads available to the system (grep -c processor /proc/cpuinfo), we haven't seen the previously hourly occurring problem for 3 months straigt.

attie-argentum commented 10 months ago

Thanks both for your responses - I've put that in place, and will report back if the issue continues! 🤞

mumbleskates commented 9 months ago

Having never seen this before, I just had two gitlab-ci containers (launched by the native runner, not the docker-in-docker one) fail with this error at the same time. Only one allocation failure was logged to dmesg (seen below). The system is also running zfs, and the system root (and docker) are on btrfs. Swap is disabled, and the system has many gigabytes of free memory both before and after the page cache and the zfs ARC.

root@erebor ~ # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy
root@erebor ~ # uname -a
Linux erebor 6.5.0-15-generic #15~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 12 18:54:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
root@erebor ~ # zfs version
zfs-2.2.2-1
zfs-kmod-2.2.2-1
root@erebor ~ # 
dmesg logs
[906739.889741] dockerd: page allocation failure: order:5, mode:0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=docker.service,mems_allowed=0
[906739.889765] CPU: 52 PID: 1207114 Comm: dockerd Tainted: P           OE      6.5.0-15-generic #15~22.04.1-Ubuntu
[906739.889772] Hardware name: ASUS System Product Name/Pro WS WRX80E-SAGE SE WIFI, BIOS 1003 02/18/2022
[906739.889776] Call Trace:
[906739.889780]  
[906739.889786]  dump_stack_lvl+0x48/0x70
[906739.889796]  dump_stack+0x10/0x20
[906739.889801]  warn_alloc+0x174/0x1f0
[906739.889812]  ? __alloc_pages_direct_compact+0x20b/0x240
[906739.889822]  __alloc_pages_slowpath.constprop.0+0x914/0x9a0
[906739.889835]  __alloc_pages+0x31d/0x350
[906739.889847]  ? veth_dev_init+0x95/0x140 [veth]
[906739.889858]  __kmalloc_large_node+0x7e/0x160
[906739.889866]  __kmalloc.cold+0xc/0xa6
[906739.889875]  veth_dev_init+0x95/0x140 [veth]
[906739.889886]  register_netdevice+0x132/0x700
[906739.889895]  veth_newlink+0x190/0x480 [veth]
[906739.889931]  rtnl_newlink_create+0x170/0x3d0
[906739.889944]  __rtnl_newlink+0x70f/0x770
[906739.889959]  rtnl_newlink+0x48/0x80
[906739.889966]  rtnetlink_rcv_msg+0x170/0x430
[906739.889972]  ? srso_return_thunk+0x5/0x10
[906739.889980]  ? rmqueue+0x93d/0xf10
[906739.889985]  ? srso_return_thunk+0x5/0x10
[906739.889991]  ? __check_object_size.part.0+0x72/0x150
[906739.889999]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[906739.890005]  netlink_rcv_skb+0x5d/0x110
[906739.890020]  rtnetlink_rcv+0x15/0x30
[906739.890027]  netlink_unicast+0x1ae/0x2a0
[906739.890035]  netlink_sendmsg+0x25e/0x4e0
[906739.890047]  sock_sendmsg+0xcc/0xd0
[906739.890053]  __sys_sendto+0x151/0x1b0
[906739.890072]  __x64_sys_sendto+0x24/0x40
[906739.890078]  do_syscall_64+0x5b/0x90
[906739.890085]  ? srso_return_thunk+0x5/0x10
[906739.890091]  ? do_user_addr_fault+0x17a/0x6b0
[906739.890097]  ? srso_return_thunk+0x5/0x10
[906739.890102]  ? exit_to_user_mode_prepare+0x30/0xb0
[906739.890110]  ? srso_return_thunk+0x5/0x10
[906739.890116]  ? irqentry_exit_to_user_mode+0x17/0x20
[906739.890122]  ? srso_return_thunk+0x5/0x10
[906739.890128]  ? irqentry_exit+0x43/0x50
[906739.890133]  ? srso_return_thunk+0x5/0x10
[906739.890139]  ? exc_page_fault+0x94/0x1b0
[906739.890146]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[906739.890153] RIP: 0033:0x55d44da6700e
[906739.890190] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[906739.890194] RSP: 002b:000000c0013750c8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[906739.890201] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 000055d44da6700e
[906739.890206] RDX: 0000000000000074 RSI: 000000c001d0e880 RDI: 000000000000000c
[906739.890209] RBP: 000000c001375108 R08: 000000c0012a4910 R09: 000000000000000c
[906739.890213] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[906739.890216] R13: 000000c0016ba800 R14: 000000c00191c1a0 R15: 0000000000000011
[906739.890227]  
[906739.890231] Mem-Info:
[906739.890239] active_anon:4026084 inactive_anon:4572236 isolated_anon:0
                 active_file:356682 inactive_file:3106746 isolated_file:0
                 unevictable:7026 dirty:361241 writeback:0
                 slab_reclaimable:417679 slab_unreclaimable:1060505
                 mapped:3338536 shmem:3269641 pagetables:30618
                 sec_pagetables:8669 bounce:0
                 kernel_misc_reclaimable:0
                 free:651883 free_pcp:319 free_cma:0
[906739.890250] Node 0 active_anon:16104336kB inactive_anon:18288944kB active_file:1426728kB inactive_file:12426984kB unevictable:28104kB isolated(anon):0kB isolated(file):0kB mapped:13354144kB dirty:1444964kB writeback:0kB shmem:13078564kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 4063232kB writeback_tmp:0kB kernel_stack:32736kB pagetables:122472kB sec_pagetables:34676kB all_unreclaimable? no
[906739.890262] Node 0 DMA free:11260kB boost:0kB min:0kB low:12kB high:24kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[906739.890274] lowmem_reserve[]: 0 2713 257385 257385 257385
[906739.890289] Node 0 DMA32 free:1022764kB boost:0kB min:712kB low:3488kB high:6264kB reserved_highatomic:32768KB active_anon:678072kB inactive_anon:29120kB active_file:0kB inactive_file:64kB unevictable:0kB writepending:0kB present:2977184kB managed:2910992kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[906739.890301] lowmem_reserve[]: 0 0 254671 254671 254671
[906739.890315] Node 0 Normal free:1574012kB boost:0kB min:66864kB low:327648kB high:588432kB reserved_highatomic:839680KB active_anon:15426264kB inactive_anon:18259824kB active_file:1426728kB inactive_file:12426920kB unevictable:28104kB writepending:1444964kB present:265275392kB managed:260792020kB mlocked:28104kB bounce:0kB free_pcp:744kB local_pcp:0kB free_cma:0kB
[906739.890328] lowmem_reserve[]: 0 0 0 0 0
[906739.890340] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 2*4096kB (M) = 11260kB
[906739.890389] Node 0 DMA32: 2173*4kB (UM) 987*8kB (UM) 586*16kB (UM) 374*32kB (UM) 666*64kB (UM) 451*128kB (UM) 295*256kB (UM) 136*512kB (UM) 60*1024kB (UM) 3*2048kB (M) 164*4096kB (UM) = 1022764kB
[906739.890440] Node 0 Normal: 22973*4kB (UME) 42920*8kB (UME) 30209*16kB (UMEH) 8817*32kB (UMEH) 5762*64kB (UMH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1569508kB
[906739.890481] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[906739.890485] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[906739.890489] 6735316 total pagecache pages
[906739.890492] 0 pages in swap cache
[906739.890495] Free swap  = 0kB
[906739.890497] Total swap = 0kB
[906739.890500] 67067142 pages RAM
[906739.890503] 0 pages HighMem/MovableOnly
[906739.890505] 1137549 pages reserved
[906739.890508] 0 pages hwpoisoned
dounoit commented 9 months ago

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used -

interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root -

i tried creating a swapfile and activating it - i get permission denied

this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

docker info

Client: Docker Engine - Community Version: 25.0.3 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.12.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.24.5 Path: /usr/libexec/docker/cli-plugins/docker-compose

Server: Containers: 19 Running: 17 Paused: 0 Stopped: 2 Images: 19 Server Version: 25.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: active NodeID: x Is Manager: true ClusterID: x Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8
SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 185.185.126.69 Manager Addresses: x.x.x.x:2377 Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin Kernel Version: 5.2.0 Operating System: Ubuntu 22.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 72GiB Name: xxxxx ID: xxxx-xxx-xxx-xx-xxxx Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

dounoit commented 9 months ago

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used -

interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root -

i tried creating a swapfile and activating it - i get permission denied

this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

docker info

Client: Docker Engine - Community Version: 25.0.3 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.12.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.24.5 Path: /usr/libexec/docker/cli-plugins/docker-compose

Server: Containers: 19 Running: 17 Paused: 0 Stopped: 2 Images: 19 Server Version: 25.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: active NodeID: x Is Manager: true ClusterID: x Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 185.185.126.69 Manager Addresses: x.x.x.x:2377 Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin Kernel Version: 5.2.0 Operating System: Ubuntu 22.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 72GiB Name: xxxxx ID: xxxx-xxx-xxx-xx-xxxx Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

this is interesting - no cpu? haha:

lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 0 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz CPU family: 6 Model: 62 Thread(s) per core: 0 Core(s) per socket: 0 Socket(s): 0 Stepping: 4 BogoMIPS: 4400.16 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t m pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmpe rf eagerfpu cpuid_faulting pni pclmulqdq dtes64 monitor dscpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4 1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vn mi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intelstibp flush l1d Virtualization features: Virtualization: VT-x Hypervisor vendor: Parallels Virtualization type: container

dounoit commented 9 months ago

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used - interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root - i tried creating a swapfile and activating it - i get permission denied this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

docker info

Client: Docker Engine - Community Version: 25.0.3 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.12.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.24.5 Path: /usr/libexec/docker/cli-plugins/docker-compose Server: Containers: 19 Running: 17 Paused: 0 Stopped: 2 Images: 19 Server Version: 25.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: active NodeID: x Is Manager: true ClusterID: x Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 185.185.126.69 Manager Addresses: x.x.x.x:2377 Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin Kernel Version: 5.2.0 Operating System: Ubuntu 22.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 72GiB Name: xxxxx ID: xxxx-xxx-xxx-xx-xxxx Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false WARNING: bridge-nf-call-ip6tables is disabled

this is interesting - no cpu? haha:

lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 0 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz CPU family: 6 Model: 62 Thread(s) per core: 0 Core(s) per socket: 0 Socket(s): 0 Stepping: 4 BogoMIPS: 4400.16 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t m pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmpe rf eagerfpu cpuid_faulting pni pclmulqdq dtes64 monitor dscpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4 1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vn mi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intelstibp flush l1d Virtualization features: Virtualization: VT-x Hypervisor vendor: Parallels Virtualization type: container

and top shows 4cpu:

image

shankerwangmiao commented 9 months ago

Hi, all

I also met this problem. I might possibly identified the cause.

It might be because kernel changed its default behavior, creating queues for each possible cpus when creating veth pairs without explicitly specifying number of rx and tx queues. The original behavior is to create only one queue. A queue requires 768 bytes of memory on one side of a veth pair. As a result servers with larger numbers of cores tend to meet this issue. I've reported the issue to the kernel mailing list.

I wonder if docker can explicitly specify 1 tx and rx queue when creating the veth pair to fix this?

CoyoteWAN commented 9 months ago

@shankerwangmiao I saw the patch to veth.c based on what you reported. Is there any way you can walk us through applying the patch to module veth?

shankerwangmiao commented 9 months ago

@shankerwangmiao I saw the patch to veth.c based on what you reported. Is there any way you can walk us through applying the patch to module veth?

The patch will be included in linux 6.8 and backported to linux lts versions, so I suggest wait for the release of linux 6.8 and the lts releases, and also wait for the corresponding kernel release by your certain linux distribution.

If you are really affected by this bug, I recommend downgrading your kernel to versions <= 5.14 provided by your linux distribution.

TL;DR: Always sticking to the kernel versions provided by your linux distribution is a wise choice, either wait (I'll update this information when such releases are available) or downgrade.

Updates:

The fix has been included in the following kernel lts versions:

For debian users:

For ubuntu users:

If downgrading is not possible, and this must be fixed... If downgrading is not possible, and this must be fixed, the following procedure can be taken to build a patched `veth.ko`. Please note that using a custom patched kernel module might lead to unexpected consequences and might be DANGEROUS if carried out by an inexperienced person. Always backup and run tests before massive deployment. Take your OWN RISK. 1. Determine the current kernel version 2. Download the source of the current kernel, and extract the `veth.c` from `drivers/net/veth.c` An alternative way to do this is to browse `https://elixir.bootlin.com/linux/latest/source/drivers/net/veth.c`, select the version on the left panel, and copy the source code on the right side. 3. Install the development package of the current kernel version, which is provided by the linux distribution and contains header needed to build a kernel module. This can be confirmed by ensuring the existence of `/lib/modules/$(uname -r)/build` 4. Apply the patch https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/patch/?id=1ce7d306ea63f3e379557c79abd88052e0483813 to the extracted `veth.c` 5. Prepare a kbuild file for the building of the module: ``` obj-m += veth.o ``` 6. Prepare the build environment: - Using a non-root user - Create a new empty directory - Only put in two files, the patched `veth.c` and the above kbuild file named `Kbuild` 7. Build the patched kernel module: - Change current dir to the above directory - Execute: `make -C "/lib/modules/$(uname -r)/build/" M="$(pwd)" modules` in above directory - Ensure `veth.ko` is generated in above directory 8. Install the patched kernel module: - Copy the generated veth.ko to `/lib/modules/$(uname -r)/updates`: `sudo install -Dm 644 veth.ko -t "/lib/modules/$(uname -r)/updates"` - Regenerate module dependencies: `sudo depmod "$(uname -r)"` - Ensure the original veth module is overridden: `sudo modinfo -k "$(uname -r)" veth` and inspect the `filename:` field, which should contain the `veth.ko` in `updates/` directory, rather than the original one in `kernel/drivers/net/` 9. Replace the current loaded veth module - Stop all docker containers - Stop `dockerd` (including `docker.service` and `docker.socket` systemd units) to prevent the creation of new containers during the process - Using `ip link show type veth` to ensure no veth interfaces are present - Execute `sudo rmmod veth` to unload the current loaded original veth module - Execute `sudo modprobe -v veth` to load the built patched veth module. The command should prints the path of the actual loaded veth module. Confirm the loaded module is the patched one - Start docker daemon and all containers needed The change made above will persist across reboots, as long as the next boot kernel is exactly the same as the current running kernel. If the kernel version has been upgraded since this boot, execute the first 8 steps on the version of the kernel which will be boot into next time. Install the development package of that kernel in step 3, remember to create a fresh new directory in step 6, and replace all the `$(uname -r)` with the exact kernel release version of next boot. To revert the changes, simply remove the installed `veth.ko` from the `updates/` directory and re-run `depmod` and follow the 9th step to replace the current loaded veth module.
attie-argentum commented 7 months ago

Adding nr_cpu=56 in my case has allowed the system to run fine until yesterday... longer perhaps, but certainly not a "fix"

bendem commented 7 months ago

If you are really affected by this bug, I recommend downgrading your kernel to versions <= 5.14 provided by your linux distribution.

RHEL8 is affected with kernel 4.18.0-513.18.1.el8_9.x86_64. Has someone reported the problem to them already? Guessing they won't care since they don't support docker in the first place, but it probably has impact on other things.

ExpliuM commented 7 months ago

We also suffer from this issue on RHEL 8.9 with kernel version4.18.0-513.11.1.el8_9.x86_64

pschoen-itsc commented 7 months ago

Adding nr_cpu=56 in my case has allowed the system to run fine until yesterday... longer perhaps, but certainly not a "fix"

The idea behind to nr_cpu workaround is to reduce to number of cpus the kernel thinks the machine has. This works well with VMs, because one VM normally has way less cores then the host system could provide. If you want to use 56 cores, then this workaround does not work well. For us, we have smaller VMs (4-6 cores) and it works without any problems.

nblazincic commented 7 months ago

We are facing this issue with docker 26 on ubuntu 22.0.4 LTS As I can see it, neither the nrcpu or vm.swap fixed this issue. Is this a confirmed kernel issue or a docker problem ?

shankerwangmiao commented 7 months ago

We are facing this issue with docker 26 on ubuntu 22.0.4 LTS As I can see it, neither the nrcpu or vm.swap fixed this issue. Is this a confirmed kernel issue or a docker problem ?

Can you look into the kernel startup log, and find the following line:

smpboot: Allowing XX CPUs, X hotplug CPUs

and see how many CPUS are allocated?

nblazincic commented 7 months ago

@shankerwangmiao Thank you for your quick reply. kernel: smpboot: Allowing 240 CPUs, 238 hotplug CPUs Do you think nrcpu or disabling cpu hot add on hypervisor could fix the issue ? Machines have 2 vCpus assigned

shankerwangmiao commented 7 months ago

@shankerwangmiao Thank you for your quick reply. kernel: smpboot: Allowing 240 CPUs, 238 hotplug CPUs Do you think nrcpu or disabling cpu hot add on hypervisor could fix the issue ? Machines have 2 vCpus assigned

Yes, either specifying nr_cpus=2 or disabling cpu hot add on the hypervisor side should work this issue around.

Currently, Debian and Ubuntu neither releases kernel package including this patch.

rdelangh commented 6 months ago

I have exactly the same error. Running ubuntu 23.10, kernel 6.5.0-28, 56 processors, 755GB RAM (744GB free)

Awaiting for the release (soon, normally) of Ubuntu 24.04 ... with (I assume) a patched kernel.

nblazincic commented 6 months ago

@shankerwangmiao solution was correct in our case. We have no more issues Thank you.

rdelangh commented 6 months ago

I have exactly the same error. Running ubuntu 23.10, kernel 6.5.0-28, 56 processors, 755GB RAM (744GB free)

Awaiting for the release (soon, normally) of Ubuntu 24.04 ... with (I assume) a patched kernel.

Still on my Ubuntu 23.10, I have downgraded to kernel to 5.15.151 because in the above messages this release is listed as one of the patched kernels:

# uname -r
5.15.151-0515151-generic

Using this Dockerfile:

FROM inernetsystemsconsortium/bind9
ENV TZ MET
CMD [ "/usr/sbin/named", "-4", "-f", "-u", "bind" ]
VOLUME /store/central/dns/secondary /etc/bind9
VOLUME /dev/log /dev/log

I have build the image "my_named_img", and launched a container with the command "bash" to be able to startup the process interactively (and capture the errors):

$ podman run -p 10053:53 -p 10053:53/udp --name bind9-container-slave1 -it -e TZ=MET -v /store/central/dns/primary/cfg:/etc/bind -v /dev/log:/dev/log my_named_img:latest /bin/bash
root@c28c01733e24:/# apt install -y strace
root@c28c01733e24:/# strace -f /usr/sbin/named -4 -u bind 2>&1
...
[pid   485] mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid   485] brk(0x55dc3ad94000)         = 0x55dc3ad73000
[pid   485] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid   485] futex(0x7fd58f375210, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid   485] getpid()                    = 484
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 88, MSG_NOSIGNAL, NULL, 0) = 88
[pid   485] mprotect(0x7fd5880da000, 4096, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot allocate memory)
[pid   485] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd380000000
[pid   485] munmap(0x7fd384000000, 67108864) = 0
[pid   485] mprotect(0x7fd380000000, 135168, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot allocate memory)
[pid   485] munmap(0x7fd380000000, 67108864) = 0
[pid   485] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid   485] getpid()                    = 484
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 74, MSG_NOSIGNAL, NULL, 0) = 74
[pid   485] getpid()                    = 484
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 133, MSG_NOSIGNAL, NULL, 0) = 133
[pid   485] getpid()                    = 484
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 116, MSG_NOSIGNAL, NULL, 0) = 116
[pid   485] getpid()                    = 484
...
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 66, MSG_NOSIGNAL, NULL, 0) = 66
[pid   485] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
[pid   485] gettid()                    = 485
[pid   485] getpid()                    = 484
[pid   485] tgkill(484, 485, SIGABRT)   = 0
[pid   485] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=484, si_uid=100} ---
[pid   591] <... futex resumed>)        = 0
[pid   577] <... futex resumed>)        = ? <unavailable>
[pid   594] <... futex resumed>)        = 0
[pid   576] <... futex resumed>)        = 0
[pid   596] <... futex resumed>)        = ?
[pid   597] <... futex resumed>)        = ?
[pid   595] <... futex resumed>)        = ?
...
[pid   487] <... futex resumed>)        = ?
[pid   486] <... futex resumed>)        = ?
[pid   484] <... rt_sigtimedwait resumed> <unfinished ...>) = ?
[pid   547] +++ killed by SIGABRT (core dumped) +++
[pid   552] +++ killed by SIGABRT (core dumped) +++
[pid   560] +++ killed by SIGABRT (core dumped) +++
[pid   563] +++ killed by SIGABRT (core dumped) +++
[pid   483] <... read resumed>"", 1)    = 0
[pid   485] +++ killed by SIGABRT (core dumped) +++
[pid   484] +++ killed by SIGABRT (core dumped) +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=484, si_uid=100, si_status=SIGABRT, si_utime=2 /* 0.02 s */, si_stime=5 /* 0.05 s */} ---
exit_group(1)                           = ?
+++ exited with 1 +++
root@c28c01733e24:/#

What surprises me are the many different PIDs, as if "named" is spawning many child procs...

doonydoo commented 6 months ago

I had a vm.overcommit_memory = 2 and got same error solution for me: vm.overcommit_memory = 0

shankerwangmiao commented 6 months ago

Debian has released kernel package with this fix:

rdelangh commented 5 months ago

I had a vm.overcommit_memory = 2 and got same error solution for me: vm.overcommit_memory = 0

@shankerwangmiao : on my Ubuntu 24.04, kernel 6.8.0-31, my setting of the "overcommit_memory" is already equal to zero (0) but I still have the error

shankerwangmiao commented 5 months ago

I had a vm.overcommit_memory = 2 and got same error solution for me: vm.overcommit_memory = 0

@shankerwangmiao : on my Ubuntu 24.04, kernel 6.8.0-31, my setting of the "overcommit_memory" is already equal to zero (0) but I still have the error

I believe 6.8.0 should not be affected by this. Can you show the output of uname -a? Also can you find the following line in kernel dmesg,

smpboot: Allowing XX CPUs, X hotplug CPUs

and see how many CPUS are allocated?

Can you see dockerd: page allocation failure: order:X in dmesg when the container fails to start?

shankerwangmiao commented 5 months ago

I had a vm.overcommit_memory = 2 and got same error solution for me: vm.overcommit_memory = 0

@shankerwangmiao : on my Ubuntu 24.04, kernel 6.8.0-31, my setting of the "overcommit_memory" is already equal to zero (0) but I still have the error

I've seen your previously posted log, and I should say your problem is totally not relevant to this issue. Although there are words "cannot allocate memory" in the title, the issue happens when the container runtime trying to create a veth pair before starting the container. In your case, I can clearly see that the container started successfully, entering a bash shell inside the container. The error happened when you starting the process named in that container, and getting -ENOMEM on syscall mprotect, which is clearly something not normal, but may be caused by various reasons.

After all, this is for docker related issues, while you are using podman ....

rdelangh commented 5 months ago

@shankerwangmiao : ok clear, indeed I seem to encounter another issue, not related to the veth interfaces. Sorry for the noise ;-)

skast96 commented 1 month ago

I am 100% sure it has to do something with our VPS hoster. If the kernel settings of the virtualized server is not correctly set this error happens after some time => memory cant be released and is running full => restart does fix that problem as the memory is released on reboot of a VPS. Our hoster https://www.easyname.at/en did give us more kernel space but this did only move the problem in an uncertain future.

hufon commented 1 month ago

Does anyone know if the problem exists on RHEL9 kernels? Patch seems to be applied on RHEL9 kernels : https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/blob/main/drivers/net/veth.c?ref_type=heads&blame=0#L1411