Open fdgonthier opened 3 years ago
Hard crash again, display in VirtualBox console.
/home/neumann/irati/kernel/cidm.c:113
}
while (cidm_allocated(instance, cep_id)) {
15b: 89 de mov %ebx,%esi
15d: 4c 89 e7 mov %r12,%rdi
160: e8 00 00 00 00 callq 165 <cidm_allocate+0x35>
161: R_X86_64_PC32 cidm_allocated-0x4
165: 85 c0 test %eax,%eax
167: 74 2c je 195 <cidm_allocate+0x65>
This is the offending line in the first oops.
This crash is an accidental side-effect of the Go program, which is intended to search for a specific problem that we've observed: a fraction of the time (indicating a timing race), when a flow is closed the opposite end of the flow does not close and terminate a waiting read() until the waiting program running in gdb is paused and resumed, at which point the read() correctly terminates with an eof. This indicates that the read() syscall is in an interruptible wait state, and when it is interrupted (by the gdb signaling) and later resumed, the re-executed syscall correctly detects that the close has occurred and terminates normally. Other times, the read() immediately returns eof when notification of the other-end close arrives, as desired.
This is causing a lot of trouble in a program that opens and closes flows in a goroutine -- the goroutines get hung in a read() after the other end closes, and after a while, the hung goroutines holding onto file descriptors causes the process to run out of available file descriptors and crashes.
In order to attempt to identify the race condition that exercises the hanging read(), this Go program (at a high level) creates a flow between a client and server, and then with random timing reads from one end and closes at the other to see if the read() completes with eof vs. hangs. As an unintended side-effect of running it, it crashes Linux. The fail-to-close bug will be filed separately, but pursuing it is hampered by the crashing problem.
This could be related to earlier closed issue #1133, which was closed as not repeatable. If it is, then the timing issue in that case was exercised when the commands were being executed manually. Opening the flow at the server ( <xx ) end would be immediately followed by an eof and close() of xx, and then a close on the flow. So the last CDAP message(s) from the server to start and to close the flow might be delivered to the IPCP while the process running the read() is still blocked in the rina_flow_alloc() syscall, and could both be processed before the read() is executed, or after, depending on timing. Just a thought -- that might explain the unrepeatability (depending on the timing of execution on the hardware, number of CPUs, etc.).
The first version of the program was a bit of a mess. This program is somewhat simplified but still exposes the problem.
Steve has suggested me I add a more complete you of the whole kernel log.
The warning messages Aug 11 14:17:15 rina1 kernel: [ 120.520294] rina-kfa(ERR): There is no flow bound to port-id XX
are actually fairly common and will actually happen when running other RINA programs. They are actually so common for me that I considered them to be line noise at first but they might need to get looked into as they always precede the kernel Oops.
Aug 11 14:16:30 rina1 kernel: [ 76.110115] rina_irati_core: loading out-of-tree module taints kernel.
Aug 11 14:16:30 rina1 kernel: [ 76.110270] rina_irati_core: module verification failed: signature and/or required key missing - taint
ing kernel
Aug 11 14:16:30 rina1 kernel: [ 76.115402] rina-pidm(INFO): Instance initialized successfully (2047 port-ids)
Aug 11 14:16:30 rina1 kernel: [ 76.115991] rina-core(INFO): IRATI RINA implementation v1.4.1 initialized
Aug 11 14:16:30 rina1 kernel: [ 76.115992] rina-core(INFO): Don't panic ...
Aug 11 14:16:30 rina1 kernel: [ 76.127595] rina-ps-factory(INFO): policy-set 'default' published successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127597] rina-rina-default-plugin(INFO): RMT default policy set loaded successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127597] rina-ps-factory(INFO): policy-set 'default' published successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127598] rina-rina-default-plugin(INFO): DTP default policy set loaded successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127599] rina-ps-factory(INFO): policy-set 'default' published successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127599] rina-rina-default-plugin(INFO): DTCP default policy set loaded successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127600] rina-ps-factory(INFO): policy-set 'default' published successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127601] rina-rina-default-plugin(INFO): Delimiting default policy set loaded successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127601] rina-ps-factory(INFO): policy-set 'default' published successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127602] rina-rina-default-plugin(INFO): PFF default policy set loaded successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127602] rina-ps-factory(INFO): policy-set 'default' published successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127603] rina-rina-default-plugin(INFO): SDU Protection default Crypto policy set loaded successful
ly
Aug 11 14:16:30 rina1 kernel: [ 76.127604] rina-ps-factory(INFO): policy-set 'CRC32' published successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127604] rina-rina-default-plugin(INFO): SDU Protection default error check policy set loaded succe
ssfully
Aug 11 14:16:30 rina1 kernel: [ 76.127605] rina-ps-factory(INFO): policy-set 'default' published successfully
Aug 11 14:16:30 rina1 kernel: [ 76.127605] rina-rina-default-plugin(INFO): SDU Protection default TTL policy set loaded successfully
Aug 11 14:16:30 rina1 kernel: [ 76.160475] rina-ipcp-factories(INFO): Factory 'normal-ipc' registered successfully
Aug 11 14:16:30 rina1 kernel: [ 76.178123] rina-arp826-tables(INFO): ARP826 tables initialized successfully
Aug 11 14:16:30 rina1 kernel: [ 76.178242] rina-arp826-arm(INFO): ARM initialized successfully
Aug 11 14:16:30 rina1 kernel: [ 76.178243] rina-arp826-core(INFO): ARP826 Initialized successfully
Aug 11 14:16:30 rina1 kernel: [ 76.190613] rina-shim-eth-vlan(INFO): shim-eth-vlan initialized
Aug 11 14:16:30 rina1 kernel: [ 76.190614] rina-ipcp-factories(INFO): Factory 'shim-eth-vlan' registered successfully
Aug 11 14:16:30 rina1 kernel: [ 76.190615] rina-shim-eth-vlan(INFO): Shim Wifi AP/STA initialized
Aug 11 14:16:30 rina1 kernel: [ 76.190616] rina-ipcp-factories(INFO): Factory 'shim-wifi-ap' registered successfully
Aug 11 14:16:30 rina1 kernel: [ 76.190617] rina-shim-eth-vlan(INFO): Shim Wifi AP/STA initialized
Aug 11 14:16:30 rina1 kernel: [ 76.190618] rina-ipcp-factories(INFO): Factory 'shim-wifi-sta' registered successfully
Aug 11 14:16:30 rina1 kernel: [ 76.203049] rina-shim-tcp-udp(INFO): shim-tcp-udp initialized
Aug 11 14:16:30 rina1 kernel: [ 76.203051] rina-ipcp-factories(INFO): Factory 'shim-tcp-udp' registered successfully
Aug 11 14:16:30 rina1 kernel: [ 76.265302] 8021q: 802.1Q VLAN Support v1.8
Aug 11 14:16:30 rina1 kernel: [ 76.265309] 8021q: adding VLAN 0 to HW filter on device enp0s3
Aug 11 14:16:30 rina1 kernel: [ 76.265302] 8021q: 802.1Q VLAN Support v1.8
Aug 11 14:16:30 rina1 kernel: [ 76.265309] 8021q: adding VLAN 0 to HW filter on device enp0s3
Aug 11 14:16:31 rina1 kernel: [ 76.643444] rina-shim-eth-vlan(WARN): DIF name not a VLAN number, using full Ethernet interface: shim.
DIF
Aug 11 14:16:31 rina1 kernel: [ 76.643880] rina-cidm(INFO): Instance initialized successfully (2047 cep-ids)
Aug 11 14:16:31 rina1 kernel: [ 76.759593] rina-rmt(INFO): RMT PS to be selected: default
Aug 11 14:16:31 rina1 kernel: [ 76.759602] rina-rmt-ps-default(WARN): No PS param q_max, setting default
Aug 11 14:16:31 rina1 kernel: [ 76.759602] rina-rmt(INFO): PFF PS to be selected: default
Aug 11 14:16:31 rina1 kernel: [ 76.759603] rina-ps-factory(INFO): Policy-set 'default' already selected
Aug 11 14:17:06 rina1 kernel: [ 111.740563] rina-kfa(ERR): There is no flow bound to port-id 3
Aug 11 14:17:07 rina1 kernel: [ 113.369543] rina-kfa(ERR): There is no flow bound to port-id 12
Aug 11 14:17:15 rina1 kernel: [ 120.520294] rina-kfa(ERR): There is no flow bound to port-id 31
Aug 11 14:17:22 rina1 kernel: [ 127.983015] rina-kfa(ERR): There is no flow bound to port-id 15
Aug 11 14:17:26 rina1 kernel: [ 132.227072] rina-kfa(ERR): There is no flow bound to port-id 42
Aug 11 14:17:28 rina1 kernel: [ 134.002054] rina-kfa(ERR): There is no flow bound to port-id 11
Aug 11 14:17:30 rina1 kernel: [ 135.552884] rina-kfa(ERR): There is no flow bound to port-id 4
Aug 11 14:17:30 rina1 kernel: [ 135.809873] rina-kfa(ERR): There is no flow bound to port-id 35
Aug 11 14:17:34 rina1 kernel: [ 139.751315] rina-kfa(ERR): There is no flow bound to port-id 64
Aug 11 14:17:34 rina1 kernel: [ 139.936017] rina-kfa(ERR): There is no flow bound to port-id 24
Aug 11 14:17:35 rina1 kernel: [ 141.212696] rina-kfa(ERR): There is no flow bound to port-id 18
Aug 11 14:17:36 rina1 kernel: [ 141.618695] rina-kfa(ERR): There is no flow bound to port-id 56
Aug 11 14:17:36 rina1 kernel: [ 141.618799] rina-kfa(ERR): There is no flow bound to port-id 32
Aug 11 14:17:37 rina1 kernel: [ 142.966141] rina-kfa(ERR): There is no flow bound to port-id 70
Aug 11 14:17:39 rina1 kernel: [ 145.321709] rina-kfa(ERR): There is no flow bound to port-id 7
Aug 11 14:17:43 rina1 kernel: [ 149.132141] rina-kfa(ERR): There is no flow bound to port-id 46
Aug 11 14:17:43 rina1 kernel: [ 149.157666] rina-kfa(ERR): There is no flow bound to port-id 84
Aug 11 14:17:44 rina1 kernel: [ 149.621111] rina-kfa(ERR): There is no flow bound to port-id 50
Aug 11 14:17:45 rina1 kernel: [ 150.566104] rina-kfa(ERR): There is no flow bound to port-id 36
Aug 11 14:17:45 rina1 kernel: [ 151.291086] rina-kfa(ERR): There is no flow bound to port-id 28
Aug 11 14:17:47 rina1 kernel: [ 152.734879] rina-kfa(ERR): There is no flow bound to port-id 44
Aug 11 14:17:47 rina1 kernel: [ 152.890552] rina-kfa(ERR): There is no flow bound to port-id 59
Aug 11 14:17:48 rina1 kernel: [ 153.677035] rina-kfa(ERR): There is no flow bound to port-id 8
Aug 11 14:17:49 rina1 kernel: [ 155.097457] rina-kfa(ERR): There is no flow bound to port-id 48
Aug 11 14:17:49 rina1 kernel: [ 155.234866] rina-kfa(ERR): There is no flow bound to port-id 39
Aug 11 14:17:50 rina1 kernel: [ 156.045201] rina-kfa(ERR): There is no flow bound to port-id 68
Aug 11 14:17:51 rina1 kernel: [ 157.276625] rina-kfa(ERR): There is no flow bound to port-id 83
Aug 11 14:17:52 rina1 kernel: [ 158.195205] rina-kfa(ERR): There is no flow bound to port-id 20
Aug 11 14:17:53 rina1 kernel: [ 159.256650] rina-kfa(ERR): There is no flow bound to port-id 16
Aug 11 14:17:53 rina1 kernel: [ 159.461094] rina-kfa(ERR): There is no flow bound to port-id 27
Aug 11 14:17:54 rina1 kernel: [ 159.837369] rina-kfa(ERR): There is no flow bound to port-id 52
Aug 11 14:17:55 rina1 kernel: [ 161.426831] rina-kfa(ERR): There is no flow bound to port-id 55
Aug 11 14:17:57 rina1 kernel: [ 163.186156] rina-kfa(ERR): There is no flow bound to port-id 87
Aug 11 14:18:00 rina1 kernel: [ 166.103120] rina-kfa(ERR): There is no flow bound to port-id 94
Aug 11 14:18:00 rina1 kernel: [ 166.242120] rina-kfa(ERR): There is no flow bound to port-id 120
Aug 11 14:18:01 rina1 kernel: [ 167.372702] rina-kfa(ERR): There is no flow bound to port-id 60
Aug 11 14:18:05 rina1 kernel: [ 170.749450] rina-kfa(ERR): There is no flow bound to port-id 23
Aug 11 14:18:05 rina1 kernel: [ 171.329004] rina-kfa(ERR): There is no flow bound to port-id 112
Aug 11 14:18:06 rina1 kernel: [ 171.621573] rina-kfa(ERR): There is no flow bound to port-id 124
Aug 11 14:18:06 rina1 kernel: [ 171.622017] general protection fault: 0000 [#1] SMP PTI
Aug 11 14:18:06 rina1 kernel: [ 171.622138] Modules linked in: 8021q garp mrp stp llc shim_tcp_udp(OE) shim_eth_vlan(OE) rinarp(OE) arp826(OE) normal_ipcp(OE) rina_default_plugin(OE) rina_irati_core(OE) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore vboxguest input_leds joydev serio_raw mac_hid intel_rapl_perf sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 vmwgfx crypto_simd glue_helper ttm cryptd drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops pata_acpi
Aug 11 14:18:06 rina1 kernel: [ 171.623555] drm ahci i2c_piix4 libahci video
Aug 11 14:18:06 rina1 kernel: [ 171.623762] CPU: 1 PID: 1907 Comm: ipcp Tainted: G OE 4.15.0-153-generic #160-Ubuntu
Aug 11 14:18:06 rina1 kernel: [ 171.623984] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Aug 11 14:18:06 rina1 kernel: [ 171.624230] RIP: 0010:cidm_allocated+0x29/0x60 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.624467] RSP: 0018:ffff9a69c1b4fcf0 EFLAGS: 00010203
Aug 11 14:18:06 rina1 kernel: [ 171.624710] RAX: 646576512c129c4a RBX: 0000000000000085 RCX: 0000000000000015
Aug 11 14:18:06 rina1 kernel: [ 171.624996] RDX: 646576512c129c4a RSI: 0000000000000085 RDI: ffff8e6db657e020
Aug 11 14:18:06 rina1 kernel: [ 171.625290] RBP: ffff9a69c1b4fd08 R08: ffff8e6dbfca60e0 R09: ffff8e6dbe003500
Aug 11 14:18:06 rina1 kernel: [ 171.625569] R10: ffffffffc088ca70 R11: 0000000000000001 R12: ffff8e6db657e020
Aug 11 14:18:06 rina1 kernel: [ 171.625835] R13: ffff8e6db5e61d80 R14: ffff8e6db628a300 R15: ffff8e6dba0aa180
Aug 11 14:18:06 rina1 kernel: [ 171.626122] FS: 00007f4a9ca70740(0000) GS:ffff8e6dbfc80000(0000) knlGS:0000000000000000
Aug 11 14:18:06 rina1 kernel: [ 171.626405] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 11 14:18:06 rina1 kernel: [ 171.626695] CR2: 000055a711d61718 CR3: 00000000351b8004 CR4: 00000000000606e0
Aug 11 14:18:06 rina1 kernel: [ 171.626997] Call Trace:
Aug 11 14:18:06 rina1 kernel: [ 171.627313] ? cidm_allocate+0x35/0x100 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.627632] efcp_connection_create+0xad/0x590 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.627950] connection_create_request+0x64/0x230 [normal_ipcp]
Aug 11 14:18:06 rina1 kernel: [ 171.628282] notify_ipcp_conn_create_req+0xa7/0x1f0 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.628622] ? deserialize_irati_msg+0x642/0x8f0 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.628984] ctrldev_write+0xfe/0x230 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.629341] __vfs_write+0x1b/0x40
Aug 11 14:18:06 rina1 kernel: [ 171.629690] vfs_write+0xb1/0x1a0
Aug 11 14:18:06 rina1 kernel: [ 171.630052] SyS_write+0x5c/0xe0
Aug 11 14:18:06 rina1 kernel: [ 171.630401] do_syscall_64+0x73/0x130
Aug 11 14:18:06 rina1 kernel: [ 171.630762] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Aug 11 14:18:06 rina1 kernel: [ 171.631121] RIP: 0033:0x7f4a9bb773a7
Aug 11 14:18:06 rina1 kernel: [ 171.631481] RSP: 002b:00007ffc0a413dd0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Aug 11 14:18:06 rina1 kernel: [ 171.631856] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f4a9bb773a7
Aug 11 14:18:06 rina1 kernel: [ 171.632221] RDX: 00000000000000da RSI: 000055a711d61710 RDI: 000000000000000a
Aug 11 14:18:06 rina1 kernel: [ 171.632579] RBP: 000055a711d61710 R08: 0000000000000000 R09: 0000000000000000
Aug 11 14:18:06 rina1 kernel: [ 171.632930] R10: 0000000000000001 R11: 0000000000000293 R12: 00000000000000da
Aug 11 14:18:06 rina1 kernel: [ 171.633288] R13: 000000000000000a R14: 00000000000000da R15: 00007f4a9bb5dcd0
Aug 11 14:18:06 rina1 kernel: [ 171.633622] Code: 00 00 0f 1f 44 00 00 48 85 ff 74 2d 48 8b 17 48 39 d7 48 8b 02 74 19 3b 72 10 75 0c eb 15 8b 48 10 48 89 d0 39 f1 74 0b 48 39 f8 <48> 8b 10 75 ee 31 c0 c3 b8 01 00 00 00 c3 83 3d 72 1a 03 00 03
Aug 11 14:18:06 rina1 kernel: [ 171.634631] RIP: cidm_allocated+0x29/0x60 [rina_irati_core] RSP: ffff9a69c1b4fcf0
Aug 11 14:18:06 rina1 kernel: [ 171.635028] ---[ end trace 4cf8f459b8f68a43 ]---
Aug 11 14:18:06 rina1 kernel: [ 171.640650] ------------[ cut here ]------------
Aug 11 14:18:06 rina1 kernel: [ 171.640985] kernel BUG at /build/linux-rjVqxh/linux-4.15.0/mm/slub.c:3953!
Aug 11 14:18:06 rina1 kernel: [ 171.641331] invalid opcode: 0000 [#2] SMP PTI
Aug 11 14:18:06 rina1 kernel: [ 171.641707] Modules linked in: 8021q garp mrp stp llc shim_tcp_udp(OE) shim_eth_vlan(OE) rinarp(OE) arp826(OE) normal_ipcp(OE) rina_default_plugin(OE) rina_irati_core(OE) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore vboxguest input_leds joydev serio_raw mac_hid intel_rapl_perf sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 vmwgfx crypto_simd glue_helper ttm cryptd drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops pata_acpi
Aug 11 14:18:06 rina1 kernel: [ 171.644904] drm ahci i2c_piix4 libahci video
Aug 11 14:18:06 rina1 kernel: [ 171.645276] CPU: 2 PID: 1904 Comm: ipcm Tainted: G D OE 4.15.0-153-generic #160-Ubuntu
Aug 11 14:18:06 rina1 kernel: [ 171.645651] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Aug 11 14:18:06 rina1 kernel: [ 171.646051] RIP: 0010:kfree+0x13a/0x180
Aug 11 14:18:06 rina1 kernel: [ 171.646426] RSP: 0018:ffff9a69c1b1fd38 EFLAGS: 00010246
Aug 11 14:18:06 rina1 kernel: [ 171.646802] RAX: ffffbf5300de3720 RBX: ffff8e6dbc413ce0 RCX: ffffbf5300d20000
Aug 11 14:18:06 rina1 kernel: [ 171.647258] RDX: 0000000000000000 RSI: 0000000000026160 RDI: 0000719200000000
Aug 11 14:18:06 rina1 kernel: [ 171.647436] rina-kfa(ERR): There is no flow bound to port-id 117
Aug 11 14:18:06 rina1 kernel: [ 171.647651] RBP: ffff9a69c1b1fd50 R08: ffff8e6db6bb2400 R09: 000000018010000f
Aug 11 14:18:06 rina1 kernel: [ 171.648395] rina-kfa(ERR): There is no flow bound to port-id 117
Aug 11 14:18:06 rina1 kernel: [ 171.648780] R10: ffffbf5300f104c0 R11: 0000000000000001 R12: ffff8e6db657e020
Aug 11 14:18:06 rina1 kernel: [ 171.648782] R13: ffffffffc07fe33e R14: ffff8e6db43d1200 R15: ffff8e6db43d1220
Aug 11 14:18:06 rina1 kernel: [ 171.650238] FS: 00007f0786d93700(0000) GS:ffff8e6dbfd00000(0000) knlGS:0000000000000000
Aug 11 14:18:06 rina1 kernel: [ 171.650647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 11 14:18:06 rina1 kernel: [ 171.651051] CR2: 00007f0778001148 CR3: 0000000036200005 CR4: 00000000000606e0
Aug 11 14:18:06 rina1 kernel: [ 171.651546] Call Trace:
Aug 11 14:18:06 rina1 kernel: [ 171.651964] rkfree+0xe/0x10 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.652378] cidm_destroy+0x29/0x70 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.652794] efcp_container_destroy+0x35/0x80 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.653263] normal_destroy+0x142/0x1d0 [normal_ipcp]
Aug 11 14:18:06 rina1 kernel: [ 171.653329] rina-kfa(ERR): There is no flow bound to port-id 98
Aug 11 14:18:06 rina1 kernel: [ 171.653681] kipcm_ipc_destroy+0x65/0x110 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.654102] rina-kfa(ERR): There is no flow bound to port-id 98
Aug 11 14:18:06 rina1 kernel: [ 171.654513] notify_destroy_ipcp+0x3b/0xd0 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.654519] ctrldev_write+0xfe/0x230 [rina_irati_core]
Aug 11 14:18:06 rina1 kernel: [ 171.655705] __vfs_write+0x1b/0x40
Aug 11 14:18:06 rina1 kernel: [ 171.656096] vfs_write+0xb1/0x1a0
Aug 11 14:18:06 rina1 kernel: [ 171.656475] SyS_write+0x5c/0xe0
Aug 11 14:18:06 rina1 kernel: [ 171.656844] do_syscall_64+0x73/0x130
Aug 11 14:18:06 rina1 kernel: [ 171.657216] entry_SYSCALL_64_after_hwframe+0x41/0xa6
Aug 11 14:18:06 rina1 kernel: [ 171.657700] RIP: 0033:0x7f07894dc3a7
Aug 11 14:18:06 rina1 kernel: [ 171.658073] RSP: 002b:00007f0786d92600 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
Aug 11 14:18:06 rina1 kernel: [ 171.658419] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f07894dc3a7
Aug 11 14:18:06 rina1 kernel: [ 171.658791] RDX: 0000000000000014 RSI: 00007f0778000f20 RDI: 0000000000000004
Aug 11 14:18:06 rina1 kernel: [ 171.659425] RBP: 00007f0778000f20 R08: 0000000000000000 R09: 0000000000000000
Aug 11 14:18:06 rina1 kernel: [ 171.659910] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000014
Aug 11 14:18:06 rina1 kernel: [ 171.660229] R13: 0000000000000004 R14: 0000000000000014 R15: 0000000000000002
Aug 11 14:18:06 rina1 kernel: [ 171.660539] Code: c4 18 48 89 da 4c 89 ee e8 84 26 9b 00 49 8b 04 24 48 85 c0 75 e3 e9 f0 fe ff ff 49 8b 02 f6 c4 80 75 0a 49 8b 42 20 a8 01 75 02 <0f> 0b 49 8b 02 31 f6 f6 c4 80 74 04 41 8b 72 6c 4c 89 d7 e8 2e
Aug 11 14:18:06 rina1 kernel: [ 171.661488] RIP: kfree+0x13a/0x180 RSP: ffff9a69c1b1fd38
Aug 11 14:18:06 rina1 kernel: [ 171.661875] ---[ end trace 4cf8f459b8f68a44 ]---
I searched in my embedded development Bag-O-Trick and remembered about ftrace. I've modified the Go program so that it triggers kernel function tracing while the program is running and let it run until it triggered a crash. This is the whole kernel function trace for the system while the program is running. I have not tried to analyze it yet.
2 more trace files, in different format. Those are limited to the CPU where the process is run.
I have more information. I enabled a few debug options in the kernel and got this.
[ 3195.814568] BUG: KASAN: use-after-free in cidm_allocated+0x188/0x1c0 [rina_irati_core]
[ 3195.820563] Call Trace:
[ 3195.820572] dump_stack+0x121/0x1ad
[ 3195.828677] ? arch_local_irq_restore+0xd/0xd
[ 3195.828682] ? printk+0xb2/0xe3
[ 3195.828686] ? log_store.cold+0x11/0x11
[ 3195.828691] ? entry_SYSCALL_64_after_hwframe+0x46/0xbb
[ 3195.828699] print_address_description+0x6c/0x228
[ 3195.828734] ? cidm_allocated+0x188/0x1c0 [rina_irati_core]
[ 3195.896757] entry_SYSCALL_64_after_hwframe+0x46/0xbb
....
[ 3195.935557] The buggy address belongs to the object at ffff88802d111930
[ 3195.935557] which belongs to the cache kmalloc-32 of size 32
[ 3195.937130] The buggy address is located 16 bytes inside of
[ 3195.937130] 32-byte region [ffff88802d111930, ffff88802d111950)
[ 3195.938602] The buggy address belongs to the page:
[ 3195.939215] page:ffffea0000b44440 count:1 mapcount:0 mapping:0000000000000000 index:0x0
[ 3195.940264] flags: 0xfffffc0000100(slab)
[ 3195.940745] raw: 000fffffc0000100 0000000000000000 0000000000000000 0000000100550055
[ 3195.941624] raw: dead000000000100 dead000000000200 ffff888021403800 0000000000000000
[ 3195.942584] page dumped because: kasan: bad access detected
[ 3195.943129]
[ 3195.943256] Memory state around the buggy address:
[ 3195.944006] ffff88802d111800: fc fc fb fb fb fb fc fc 00 00 00 fc fc fc fb fb
[ 3195.944843] ffff88802d111880: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
[ 3195.945661] >ffff88802d111900: fb fb fb fb fc fc fb fb fb fb fc fc 00 00 00 fc
[ 3195.946555] ^
[ 3195.947195] ffff88802d111980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
[ 3195.948049] ffff88802d111a00: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
[ 3195.948762] ==================================================================
Thanks for all this work debugging the issue, and sorry for not having been able to contribute so far! Hopefully next week I can allocate some time. May your latest PR #1358 have fixed this issue?
Yes #1358 seems to be fixing this bug. IRATI kernel modules look lot more stable for me right now. I believe this can be closed.
This is onto me since apparently my patch was not 100% correct. Since it improves stability I believe I'm not that far from the correct patch but I just go this crash, which is obviously related to the same problem.
[166861.536274] CPU: 29 PID: 28948 Comm: ipcp Tainted: G OEL 5.4.0-80-generic #90~18.04.1-Ubuntu
[166861.536275] Hardware name: Dell Inc. PowerEdge R740xd/0YNX56, BIOS 2.2.11 06/13/2019
[166861.536277] RIP: 0010:native_queued_spin_lock_slowpath+0x68/0x1f0
[166861.536278] Code: 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 20 85 c0 75 0c b8 01 00 00 00 66 89 07 5d c3 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 eb ec f6 c4 01 75 04 c6
[166861.536279] RSP: 0018:ffffb451ce747c88 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[166861.536280] RAX: 0000000000c00101 RBX: ffff8e2cd2da7dc0 RCX: ffff8e2cd2da7dc0
[166861.536280] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8e2cd2da7dd4
[166861.536280] RBP: ffffb451ce747c88 R08: 0000000000000002 R09: 000000000002a640
[166861.536281] R10: ffffffffc0a06820 R11: 0000000000132602 R12: 0000000000000064
[166861.536281] R13: ffff8e2cd2da7dd4 R14: ffff8e2cc72210c0 R15: ffff8e2c9cdc7100
[166861.536282] FS: 00007f2fbfd00740(0000) GS:ffff8e2cdf980000(0000) knlGS:0000000000000000
[166861.536283] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[166861.536283] CR2: 00005593fcd796b8 CR3: 0000002fb2d5e003 CR4: 00000000007606e0
[166861.536284] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[166861.536284] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[166861.536284] PKRU: 55555554
[166861.536285] Call Trace:
[166861.536287] _raw_spin_lock+0x1f/0x30
[166861.536299] cidm_allocated+0x25/0x90 [rina_irati_core]
[166861.536308] cidm_allocate+0x39/0x140 [rina_irati_core]
[166861.536316] efcp_connection_create+0xad/0x590 [rina_irati_core]
[166861.536322] ? generic_alloc+0x21/0x70 [rina_irati_core]
[166861.536324] connection_create_arrived+0x60/0x250 [normal_ipcp]
[166861.536331] notify_ipcp_conn_create_arrived+0xb5/0x210 [rina_irati_core]
[166861.536340] ? deserialize_irati_msg+0x642/0x8f0 [rina_irati_core]
[166861.536348] ctrldev_write+0x10e/0x240 [rina_irati_core]
[166861.536350] __vfs_write+0x1b/0x40
[166861.536351] vfs_write+0xb1/0x1a0
[166861.536352] ksys_write+0xa7/0xe0
[166861.536353] __x64_sys_write+0x1a/0x20
[166861.536355] do_syscall_64+0x57/0x190
[166861.536356] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[166861.536357] RIP: 0033:0x7f2fbee163a7
[166861.536358] Code: 44 00 00 41 54 55 49 89 d4 53 48 89 f5 89 fb 48 83 ec 10 e8 5b fd ff ff 4c 89 e2 41 89 c0 48 89 ee 89 df b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 94 fd ff ff 48
[166861.536358] RSP: 002b:00007ffc25f3b180 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[166861.536359] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f2fbee163a7
[166861.536359] RDX: 00000000000000da RSI: 00005593fccc5f10 RDI: 000000000000000a
[166861.536360] RBP: 00005593fccc5f10 R08: 0000000000000000 R09: 0000000000000000
[166861.536361] R10: 0000000000000001 R11: 0000000000000293 R12: 00000000000000da
[166861.536361] R13: 000000000000000a R14: 00000000000000da R15: 0000000000000000
The attached Go program reproduces fairly reliably a crash that happens within the IRATI kernel modules. 2 stack trace appear in quick succession.
main.go.txt