clearcontainers / packaging

Packaging data for Clear Containers
11 stars 11 forks source link

Kernel crashes when a vCPU is hot added and the container is destroyed #278

Closed devimc closed 6 years ago

devimc commented 6 years ago

Kernel crashes when the container uses SCSI + Hotplug vCPUs and it is destroyed

[  183.817790] Unregister pv shared memory for cpu 7
[  183.819428] smpboot: CPU 7 is now offline
[  183.825430] general protection fault: 0000 [#1] SMP
[  183.825681] CPU: 6 PID: 74 Comm: kworker/u480:1 Not tainted 4.9.60-80.container #4
[  183.825820] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  183.825958] task: ffff880079036e00 task.stack: ffff88007906c000
[  183.826080] RIP: 0010:[<ffffffff8139f0b7>]  [<ffffffff8139f0b7>] __virtscsi_set_affinity+0x67/0x130
[  183.826257] RSP: 0018:ffff88007906fc38  EFLAGS: 00010286
[  183.826358] RAX: 53006b6c625f6f69 RBX: 00000000ffffffff RCX: 00000000ffffff01
[  183.826495] RDX: ffffffff81a88418 RSI: 00000000ffffff01 RDI: ffff880078c68127
[  183.826662] RBP: ffff88007906fc58 R08: 0000000000000007 R09: ffffffff81a88418
[  183.826829] R10: ffffffff81a88400 R11: 0000000000013aa9 R12: ffff880078c67db0
{"level":"info","msg":"Received udev event","name":"cc-agent","pid":433,"subsystem":"udevlistener","time":"2018-02-27T16:18:54.66059929Z","udev-event":"remove","udev-path":"/sys/devices/virtual/msr/msr7"}[  183.827007] R
13: 0000000000000005 R14: ffffffff8139f180 R15: ffff880078c67f80{"level":"info","msg":"Received udev event","name":"cc-agent","pid":433,"subsystem":"udevlistener","time":"2018-02-27T16:18:54.660808481Z","udev-event":"remove","udev-path":"/sys/devices/virtual/cpuid/cpu7"}

[  183.827253] FS:  0000000000000000(0000) GS:ffff88007b8c0000(0000) knlGS:0000000000000000
[  183.827442] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  183.827577] CR2: 00007ff3eb2a3e94 CR3: 0000000001a09000 CR4: 00000000001406a0
[  183.827744] Stack:
[  183.827808]  0000000000000007 0000000000000000 ffffffff81a24548 ffffffff8139f180
[  183.828030]  ffff88007906fc68 ffffffff8139f1a1 ffff88007906fca8 ffffffff8104154c
[  183.828228]  00000000810648d0 ffff88007b8ec5e0 0000000000000000 0000000000000007
[  183.828428] Call Trace:
[  183.828475]  [<ffffffff8139f180>] ? __virtscsi_set_affinity+0x130/0x130
[  183.828591]  [<ffffffff8139f1a1>] virtscsi_cpu_online+0x21/0x30
[  183.828699]  [<ffffffff8104154c>] cpuhp_invoke_callback+0xbc/0x120
[  183.828805]  [<ffffffff81041bfd>] cpuhp_down_callbacks+0x3d/0x80
[  183.828932]  [<ffffffff8160d8bf>] _cpu_down.constprop.8+0xaf/0x130
[  183.829039]  [<ffffffff81042305>] do_cpu_down+0x35/0x50
[  183.829120]  [<ffffffff8104269b>] cpu_down+0xb/0x10
[  183.829203]  [<ffffffff8137356f>] cpu_subsys_offline+0xf/0x20
[  183.829307]  [<ffffffff8136de67>] device_offline+0x87/0xb0
[  183.829396]  [<ffffffff812fbd0b>] acpi_bus_offline+0x9b/0x110
[  183.829496]  [<ffffffff812fdb33>] acpi_device_hotplug+0x183/0x470
[  183.829597]  [<ffffffff812f7639>] acpi_hotplug_work_fn+0x19/0x30
[  183.829700]  [<ffffffff81055aea>] process_one_work+0x1ba/0x3e0
[  183.829802]  [<ffffffff81055d56>] worker_thread+0x46/0x4f0
[  183.829892]  [<ffffffff81055d10>] ? process_one_work+0x3e0/0x3e0
[  183.829993]  [<ffffffff8105b8f2>] kthread+0xd2/0xf0
[  183.830077]  [<ffffffff8105b820>] ? __kthread_create_on_node+0x150/0x150
[  183.830180]  [<ffffffff81612995>] ret_from_fork+0x25/0x30
[  183.830260] Code: 84 24 c8 01 00 00 85 c0 74 43 45 31 ed bb ff ff ff ff 49 63 c5 48 83 c0 21 48 c1 e0 04 49 8b 7c 04 10 48 85 ff 74 18 48 8b 47 20 <48> 8b 80 d0 01 00 00 48 8b 40 58 48 85 c0 74 04 89 de ff d0 41 
[  183.831643] RIP  [<ffffffff8139f0b7>] __virtscsi_set_affinity+0x67/0x130
[  183.831766]  RSP <ffff88007906fc38>
[  183.831853] ---[ end trace 8626bb76055f2e5d ]---
[  183.831958] BUG: unable to handle kernel paging request at ffffffffffffffd8
[  183.832078] IP: [<ffffffff8105be7c>] kthread_data+0xc/0x20
[  183.832180] PGD 1a0a067 [  183.832220] PUD 1a0c067 
PMD 0 [  183.832281] 
[  183.832323] Oops: 0000 [#2] SMP
[  183.832384] CPU: 6 PID: 74 Comm: kworker/u480:1 Tainted: G      D         4.9.60-80.container #4
[  183.832533] task: ffff880079036e00 task.stack: ffff88007906c000
[  183.832634] RIP: 0010:[<ffffffff8105be7c>]  [<ffffffff8105be7c>] kthread_data+0xc/0x20
[  183.832756] RSP: 0018:ffff88007906fe78  EFLAGS: 00010002
[  183.832849] RAX: 0000000000000000 RBX: ffff88007b8d6380 RCX: 0000000000000006
[  183.832974] RDX: ffff88007b406000 RSI: ffff880079036e80 RDI: ffff880079036e00
[  183.833096] RBP: ffff88007906fe80 R08: 0000000000000000 R09: 0000000000000000
[  183.833218] R10: 0000000000002c00 R11: 0000000000000000 R12: 0000000000000000
[  183.833339] R13: ffff880079036e00 R14: ffff8800790373b8 R15: ffff88007b6d8040
[  183.833462] FS:  0000000000000000(0000) GS:ffff88007b8c0000(0000) knlGS:0000000000000000
[  183.833583] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  183.833685] CR2: 0000000000000028 CR3: 0000000001a09000 CR4: 00000000001406a0
[  183.833808] Stack:
[  183.833853]  ffffffff81055539 ffff88007906fec8 ffffffff8160ed4c ffff88007b8d6380
[  183.834036]  ffff88007906fec8 ffff880079036e00 ffff88007906ff10 ffff88007906fa48
[  183.834210]  0000000000000000 ffff88007b6d8040 ffff88007906fee0 ffffffff810648c3
[  183.834394] Call Trace:
[  183.834437]  [<ffffffff81055539>] ? wq_worker_sleeping+0x9/0x80
[  183.834539]  [<ffffffff8160ed4c>] __schedule+0x3ec/0x600
[  183.834621]  [<ffffffff810648c3>] do_task_dead+0x33/0x40
[  183.834702]  [<ffffffff81044aef>] do_exit+0x61f/0xa70
[  183.834783]  [<ffffffff81613ff7>] rewind_stack_do_exit+0x17/0x19
[  183.834889] Code: e8 ea 98 03 00 84 c0 74 d1 bf 01 00 00 00 e8 3c 99 03 00 eb c7 66 2e 0f 1f 84 00 00 00 00 00 48 8b 87 40 05 00 00 55 48 89 e5 5d <48> 8b 40 d8 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 
[  183.836267] RIP  [<ffffffff8105be7c>] kthread_data+0xc/0x20
[  183.836374]  RSP <ffff88007906fe78>
[  183.836436] CR2: ffffffffffffffd8
[  183.836498] ---[ end trace 8626bb76055f2e5e ]---
[  183.836581] Fixing recursive fault but reboot is needed!

This error does not occurs using linux kernel 4.14.x