IRATI / stack

RINA implementation for OS/Linux
http://irati.github.io/stack
73 stars 39 forks source link

Lock-up in pidm.c #1364

Open fdgonthier opened 2 years ago

fdgonthier commented 2 years ago

This looks very similar to the problem I observed with CIDM. I'll send the patch about this as soon as I can.

[3437663.796621] CPU: 27 PID: 9370 Comm: ipcm-io-thread Tainted: G           OEL   4.15.0-161-generic #169-Ubuntu
[3437663.796621] Hardware name: Dell Inc. PowerEdge R740xd/0YNX56, BIOS 2.12.2 07/09/2021
[3437663.796629] RIP: 0010:pidm_allocated+0x2c/0x60 [rina_irati_core]
[3437663.796630] RSP: 0018:ffffa53c8e2f7da8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff11
[3437663.796631] RAX: ffff8aacbc944060 RBX: 0000000000000106 RCX: 0000000000000381
[3437663.796631] RDX: ffff8aacbc944f20 RSI: 0000000000000106 RDI: ffff8a949a0b2840
[3437663.796632] RBP: ffffa53c8e2f7dc0 R08: 0000000000000000 R09: 0000000000000000
[3437663.796632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a949a0b2840
[3437663.796632] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8aacd18c6c60
[3437663.796633] FS:  00007fe620d1c700(0000) GS:ffff8aacdf340000(0000) knlGS:0000000000000000
[3437663.796634] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3437663.796634] CR2: 00007fe61c2821f8 CR3: 0000002f38318002 CR4: 00000000007606e0
[3437663.796635] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3437663.796635] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3437663.796635] PKRU: 55555554
[3437663.796636] Call Trace:
[3437663.796643]  ? pidm_allocate+0x35/0x100 [rina_irati_core]
[3437663.796649]  kfa_port_id_reserve+0x27/0xb0 [rina_irati_core]
[3437663.796656]  notify_ipcp_allocate_flow_request+0x7a/0x280 [rina_irati_core]
[3437663.796664]  ? deserialize_irati_msg+0x334/0x8f0 [rina_irati_core]
[3437663.796671]  ctrldev_write+0xfe/0x230 [rina_irati_core]
[3437663.796673]  __vfs_write+0x1b/0x40
[3437663.796673]  vfs_write+0xb1/0x1a0
[3437663.796674]  SyS_write+0x5c/0xe0
[3437663.796676]  do_syscall_64+0x73/0x130
[3437663.796677]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
fdgonthier commented 2 years ago

This loop is going to run forever if all the ports happen to be allocated. It seems to be unlikely to happen but I swear it happening to us right now. None of the ports allocated ever end up being used in a successful flow so I presume that's why we do not run out of memory before running out of ports.

image

The bug is double. Something does not free up ports, and the kernel module do not react properly to running out of ports.