clearcontainers / runtime

OCI (Open Containers Initiative) compatible runtime using Virtual Machines
Apache License 2.0
589 stars 70 forks source link

Kernel 4.9 crashes inside the VM when used with Semaphore CI #91

Open sboeuf opened 7 years ago

sboeuf commented 7 years ago

For some unknown reasons, we cannot use the latest kernels for Clear Containers guest kernel when using Semaphore CI. Indeed, using some 4.9 kernels (maybe something changed in the config and it is not related to the kernel version), makes the kernel crashing inside the VM, while it works perfectly with kernel 4.5-50. This was the kernel released just before we switch to 4.9.4-53.

gorozco1 commented 7 years ago

Kernel is being update to 4.9.24

sboeuf commented 7 years ago

@gorozco1 I will give it a try with the new version, but I doubt it will work. I think something related to the kernel config is making this crash happening.

gorozco1 commented 7 years ago

Which ubuntu version is running as a host?

jodh-intel commented 7 years ago

Could you paste the crash details into this issue?

jodh-intel commented 7 years ago

I suspect it's Trusty (14.04.1) running 3.13.0-32-generic (https://semaphoreci.com/jamesodhunt/procenv/branches/master/builds/22)

jodh-intel commented 7 years ago

@sboeuf - can you confirm?

gorozco1 commented 7 years ago

with cc and recent changes in cc-kernel to support iptables networking stuff we will have a co-dependency with the host kernel, it should be minimum 4.x in the host (I guess), could you confirm @mcastelino @amshinde

sboeuf commented 7 years ago

@jodh-intel @gorozco1 Yes this is Ubuntu 14.04 with kernel 3.13 on the host. The iptables patches have been introduced very recently and I think they were not in 4.9.4-53, thus I think that's another issue. @jodh-intel I will paste logs from crash when I will investigate this issue. I forgot to save them last week.

mcastelino commented 7 years ago

@gorozco1 @amshinde Do we support Ubuntu 14.04? In any case you should get an error on iptables-restore and not a crash if the host has a iptables module enabled which we do not enable in our CC kernel.

From initial testing it was ok to have iptables module enabled in CC but not enabled on the host kernel. So there should not be a dependency across the kernels per-se if we have all modules enabled that are enabled across distros. Unless the 3.x kernel has a module that is no longer supported in 4.x.

The logs will help.

amshinde commented 7 years ago

The recent kernel changes may cause a failure in iptables-restore command in the worst case, but should not cause a crash.

jodh-intel commented 7 years ago

Assigned to @sboeuf for now to ensure we don't forget to add the logs.

sboeuf commented 7 years ago

Run 1:

[    0.534654] random: systemd: uninitialized urandom read (16 bytes read)
[    0.535563] systemd[1]: Initializing machine ID from random generator.
[    0.540288] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    0.541028] IP: [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.541028] PGD 7c503067 [    0.541028] PUD 7c54e067 
PMD 0 [    0.541028] 
[    0.541028] Oops: 0000 [#1] SMP
[    0.541028] CPU: 0 PID: 96 Comm: systemd Tainted: G        W       4.9.35-4.9.35-62.container #1
[    0.541028] task: ffff88007d399580 task.stack: ffff88007c548000
[    0.541028] RIP: 0010:[<ffffffff812a60d7>]  [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.541028] RSP: 0018:ffff88007c54bce0  EFLAGS: 00010046
[    0.541028] RAX: 0000000000000000 RBX: ffff88007cae4c00 RCX: ffff88007c5b6cc0
[    0.541028] RDX: 00007f54d375b828 RSI: ffff88007cae4c28 RDI: ffff88007c5b6cd0
[    0.541028] RBP: ffff88007c54bce0 R08: 00007f54d375b828 R09: 0000000000000000
[    0.541028] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c5b6cc0
[    0.541028] R13: 0000000000000000 R14: ffff88007d399600 R15: 0000000000000000
[    0.541028] FS:  00007f54d30ed140(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[    0.541028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.541028] CR2: 0000000000000008 CR3: 000000007c504000 CR4: 00000000001406b0
[    0.541028] Stack:
[    0.541028]  ffff88007c54bcf0 ffffffff81068437 ffff88007c54bd70 ffffffff8106c2b6
[    0.541028]  0000000100000000 ffff88007d399600 ffff880000000000 0000000000000246
[    0.541028]  0000000020338143 0000000000000025 0000000000000000 00000000203415bb
[    0.541028] Call Trace:
[    0.541028]  [<ffffffff81068437>] __enqueue_entity+0x67/0x70
[    0.541028]  [<ffffffff8106c2b6>] enqueue_entity+0x256/0xd30
[    0.541028]  [<ffffffff810778af>] ? __wake_up+0x3f/0x50
[    0.541028]  [<ffffffff8106cde7>] enqueue_task_fair+0x57/0x9f0
[    0.541028]  [<ffffffff81066d97>] ? sched_clock_local+0x17/0x80
[    0.541028]  [<ffffffff81066fa4>] ? sched_clock_cpu+0x84/0xa0
[    0.541028]  [<ffffffff8106155a>] activate_task+0x4a/0x90
[    0.541028]  [<ffffffff81062f5f>] wake_up_new_task+0xff/0x180
[    0.541028]  [<ffffffff8103fbfa>] _do_fork+0x12a/0x310
[    0.541028]  [<ffffffff8103fe64>] SyS_clone+0x14/0x20
[    0.541028]  [<ffffffff8100117a>] do_syscall_64+0x7a/0x310
[    0.541028]  [<ffffffff81594e6b>] entry_SYSCALL64_slow_path+0x25/0x25
[    0.541028] Code: 78 10 e9 73 ff ff ff 4d 89 e7 e9 0d ff ff ff 0f 1f 44 00 00 55 48 8b 17 48 89 e5 48 85 d2 0f 84 36 01 00 00 48 8b 02 a8 01 75 40 <48> 8b 48 08 49 89 c0 48 39 d1 74 7d 48 85 c9 74 31 f6 01 01 75 
[    0.541028] RIP  [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.541028]  RSP <ffff88007c54bce0>
[    0.541028] CR2: 0000000000000008
[    0.541028] ---[ end trace 27e0ee893729888c ]---

Run 2:

[    0.532783] random: systemd: uninitialized urandom read (16 bytes read)
[    0.533667] systemd[1]: Initializing machine ID from random generator.
[    0.538449] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    0.539374] IP: [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.539374] PGD 7c568067 [    0.539374] PUD 7c569067 
PMD 0 [    0.539374] 
[    0.539374] Oops: 0000 [#1] SMP
[    0.539374] CPU: 1 PID: 96 Comm: systemd Tainted: G        W       4.9.35-4.9.35-62.container #1
[    0.539374] task: ffff88007c585480 task.stack: ffff88007c588000
[    0.539374] RIP: 0010:[<ffffffff812a60d7>]  [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.539374] RSP: 0018:ffff88007c58bcb0  EFLAGS: 00010046
[    0.539374] RAX: 0000000000000000 RBX: ffff88007c585500 RCX: ffff88007c585500
[    0.539374] RDX: 00007f4d8fa48828 RSI: ffff88007c4ee828 RDI: ffff88007c585510
[    0.539374] RBP: ffff88007c58bcb0 R08: 00007f4d8fa48828 R09: 0000000000000000
[    0.539374] R10: 00000000656e6567 R11: ffff88007cc761b8 R12: ffff88007c4ee800
[    0.539374] R13: ffffffff8160d980 R14: ffff88007c585a20 R15: ffff88007fd15240
[    0.539374] FS:  00007f4d8f3da140(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[    0.539374] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.539374] CR2: 0000000000000028 CR3: 000000007c566000 CR4: 00000000001406a0
[    0.539374] Stack:
[    0.539374]  ffff88007c58bcc0 ffffffff81068437 ffff88007c58bd18 ffffffff8106f70e
[    0.539374]  ffff88007c58bcf0 ffffffff810617a2 ffff88007d13c180 ffff88007fd15240
[    0.539374]  ffff88007c585500 ffff88007d13c180 ffffffff8160d980 ffff88007c585a20
[    0.539374] Call Trace:
[    0.539374]  [<ffffffff81068437>] __enqueue_entity+0x67/0x70
[    0.539374]  [<ffffffff8106f70e>] put_prev_entity+0xae/0x8f0
[    0.539374]  [<ffffffff810617a2>] ? ttwu_do_wakeup+0x12/0x90
[    0.539374]  [<ffffffff8106ff6d>] put_prev_task_fair+0x1d/0x30
[    0.539374]  [<ffffffff810773e7>] pick_next_task_stop+0x27/0x40
[    0.539374]  [<ffffffff8159132d>] __schedule+0x2fd/0x600
[    0.539374]  [<ffffffff815917d6>] _cond_resched+0x26/0x40
[    0.539374]  [<ffffffff810b47fd>] stop_one_cpu+0x5d/0x80
[    0.539374]  [<ffffffff810626f0>] ? sched_ttwu_pending+0x80/0x80
[    0.539374]  [<ffffffff8106322d>] sched_exec+0x7d/0xa0
[    0.539374]  [<ffffffff81118a9a>] do_execveat_common+0x18a/0x680
[    0.539374]  [<ffffffff8111beee>] ? getname_flags+0x4e/0x190
[    0.539374]  [<ffffffff811191b8>] SyS_execve+0x28/0x30
[    0.539374]  [<ffffffff8100117a>] do_syscall_64+0x7a/0x310
[    0.539374]  [<ffffffff81052091>] ? SyS_prctl+0x41/0x460
[    0.539374]  [<ffffffff81033d52>] ? do_page_fault+0x32/0x90
[    0.539374]  [<ffffffff81594e6b>] entry_SYSCALL64_slow_path+0x25/0x25
[    0.539374] Code: 78 10 e9 73 ff ff ff 4d 89 e7 e9 0d ff ff ff 0f 1f 44 00 00 55 48 8b 17 48 89 e5 48 85 d2 0f 84 36 01 00 00 48 8b 02 a8 01 75 40 <48> 8b 48 08 49 89 c0 48 39 d1 74 7d 48 85 c9 74 31 f6 01 01 75 
[    0.539374] RIP  [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.539374]  RSP <ffff88007c58bcb0>
[    0.539374] CR2: 0000000000000008
[    0.539374] ---[ end trace 8b537d625d992aed ]---
[    0.539051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    0.539051] IP: [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.539051] PGD 7c500067 
[    0.539051] PUD 7c503067 
[    0.539051] PMD 0 
[    0.539051] 
[    0.539051] Oops: 0000 [#2] SMP
[    0.539051] CPU: 0 PID: 95 Comm: systemd Tainted: G      D W       4.9.35-4.9.35-62.container #1
[    0.539051] task: ffff88007c53c0c0 task.stack: ffff88007c540000
[    0.539051] RIP: 0010:[<ffffffff812a60d7>]  [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.539051] RSP: 0018:ffff88007c543ce0  EFLAGS: 00010046
[    0.539051] RAX: 0000000000000000 RBX: ffff88007c4ee300 RCX: ffff88007c584180
[    0.539051] RDX: 00007f4d8fa48828 RSI: ffff88007c4ee328 RDI: ffff88007c584190
[    0.539051] RBP: ffff88007c543ce0 R08: 00007f4d8fa48828 R09: 0000000000000000
[    0.539051] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c584180
[    0.539051] R13: 0000000000000000 R14: ffff88007c53c140 R15: 0000000000000000
[    0.539051] FS:  00007f4d8f3da140(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[    0.539051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.539051] CR2: 0000000000000008 CR3: 000000007c501000 CR4: 00000000001406b0
[    0.539051] Stack:
[    0.539051]  ffff88007c543cf0 ffffffff81068437 ffff88007c543d70 ffffffff8106c2b6
[    0.539051]  0000000100000000 ffff88007c53c140 ffff880000000000 0000000000000246
[    0.539051]  000000002017fe7c 0000000000000025 0000000000000000 00000000201895ec
[    0.539051] Call Trace:
[    0.539051]  [<ffffffff81068437>] __enqueue_entity+0x67/0x70
[    0.539051]  [<ffffffff8106c2b6>] enqueue_entity+0x256/0xd30
[    0.539051]  [<ffffffff810778af>] ? __wake_up+0x3f/0x50
[    0.539051]  [<ffffffff8106cde7>] enqueue_task_fair+0x57/0x9f0
[    0.539051]  [<ffffffff81066d97>] ? sched_clock_local+0x17/0x80
[    0.539051]  [<ffffffff81066fa4>] ? sched_clock_cpu+0x84/0xa0
[    0.539051]  [<ffffffff8106155a>] activate_task+0x4a/0x90
[    0.539051]  [<ffffffff81062f5f>] wake_up_new_task+0xff/0x180
[    0.539051]  [<ffffffff8103fbfa>] _do_fork+0x12a/0x310
[    0.539051]  [<ffffffff8103fe64>] SyS_clone+0x14/0x20
[    0.539051]  [<ffffffff8100117a>] do_syscall_64+0x7a/0x310
[    0.539051]  [<ffffffff81594e6b>] entry_SYSCALL64_slow_path+0x25/0x25
[    0.539051] Code: 78 10 e9 73 ff ff ff 4d 89 e7 e9 0d ff ff ff 0f 1f 44 00 00 55 48 8b 17 48 89 e5 48 85 d2 0f 84 36 01 00 00 48 8b 02 a8 01 75 40 <48> 8b 48 08 49 89 c0 48 39 d1 74 7d 48 85 c9 74 31 f6 01 01 75 
[    0.539051] RIP  [<ffffffff812a60d7>] rb_insert_color+0x17/0x190
[    0.539051]  RSP <ffff88007c543ce0>
[    0.539051] CR2: 0000000000000008
[    0.539051] ---[ end trace 8b537d625d992aee ]---

Run 3:

[    0.560907] random: systemd: uninitialized urandom read (16 bytes read)
[    0.561884] systemd[1]: Initializing machine ID from random generator.
[    0.567855] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[    0.567855] 
[    0.568777] CPU: 1 PID: 97 Comm: systemd Tainted: G        W       4.9.35-4.9.35-62.container #1
[    0.569047]  ffff88007c583cd0 ffffffff812a05ce ffff88007fd15200 ffffffff8173d428
[    0.569047]  ffff88007c583d50 ffffffff810ca8a3 ffff880000000008 ffff88007c583d60
[    0.569047]  ffff88007c583cf8 ffffffff8106188a 0000000000000282 0000000000000000
[    0.569047] Call Trace:
[    0.569047]  [<ffffffff812a05ce>] dump_stack+0x63/0x85
[    0.569047]  [<ffffffff810ca8a3>] panic+0xd0/0x1fd
[    0.569047]  [<ffffffff8106188a>] ? ttwu_do_activate+0x6a/0x80
[    0.574378]  [<ffffffff81591627>] __schedule+0x5f7/0x600
[    0.574378]  [<ffffffff815917d6>] _cond_resched+0x26/0x40
[    0.574378]  [<ffffffff810b47fd>] stop_one_cpu+0x5d/0x80
[    0.574378]  [<ffffffff810626f0>] ? sched_ttwu_pending+0x80/0x80
[    0.574378]  [<ffffffff8106322d>] sched_exec+0x7d/0xa0
[    0.574378]  [<ffffffff81118a9a>] do_execveat_common+0x18a/0x680
[    0.574378]  [<ffffffff8111beee>] ? getname_flags+0x4e/0x190
[    0.574378]  [<ffffffff811191b8>] SyS_execve+0x28/0x30
[    0.574378]  [<ffffffff8100117a>] do_syscall_64+0x7a/0x310
[    0.574378]  [<ffffffff81052091>] ? SyS_prctl+0x41/0x460
[    0.574378]  [<ffffffff81033d52>] ? do_page_fault+0x32/0x90
[    0.574378]  [<ffffffff81594e6b>] entry_SYSCALL64_slow_path+0x25/0x25

Run 4:

[    0.554210] random: systemd: uninitialized urandom read (16 bytes read)
[    0.554679] systemd[1]: Initializing machine ID from random generator.
[    0.559569] general protection fault: 0000 [#1] SMP
[    0.559577] ------------[ cut here ]------------
[    0.559580] WARNING: CPU: 1 PID: 96 at kernel/cgroup.c:782 css_set_move_task+0x215/0x250
[    0.559582] CPU: 1 PID: 96 Comm: systemd Tainted: G        W       4.9.35-4.9.35-62.container #1
[    0.559584]  ffff88007c56bcc0 ffffffff812a05ce 0000000000000000 0000000000000000
[    0.559584]  ffff88007c56bd00 ffffffff8104078c 0000030e0477f000 ffff88007c540400
[    0.559585]  0000000000000000 ffff88007d393500 ffff88007d393cb0 ffff88007d366d80
[    0.559585] Call Trace:
[    0.559588]  [<ffffffff812a05ce>] dump_stack+0x63/0x85
[    0.559590]  [<ffffffff8104078c>] __warn+0xbc/0xe0
[    0.559591]  [<ffffffff81040868>] warn_slowpath_null+0x18/0x20
[    0.559591]  [<ffffffff810adda5>] css_set_move_task+0x215/0x250
[    0.559592]  [<ffffffff810b09ef>] cgroup_post_fork+0xbf/0xd0
[    0.559593]  [<ffffffff8103f686>] copy_process.part.6+0x1586/0x18e0
[    0.559594]  [<ffffffff8103fbad>] _do_fork+0xdd/0x310
[    0.559594]  [<ffffffff8103fe64>] SyS_clone+0x14/0x20
[    0.559596]  [<ffffffff8100117a>] do_syscall_64+0x7a/0x310
[    0.559598]  [<ffffffff81594e6b>] entry_SYSCALL64_slow_path+0x25/0x25
[    0.559599] ---[ end trace 82bfbfca2222881e ]---
[    0.560456] CPU: 0 PID: 11 Comm: migration/0 Tainted: G        W       4.9.35-4.9.35-62.container #1
[    0.560456] task: ffff88007d0fcac0 task.stack: ffff88007d104000
[    0.560456] RIP: 0010:[<ffffffff812a6169>]  [<ffffffff812a6169>] rb_insert_color+0xa9/0x190
[    0.560456] RSP: 0018:ffff88007d107cf0  EFLAGS: 00010006
[    0.560456] RAX: ffff88007d392be8 RBX: ffff88007c4cd200 RCX: ffff88007fd15b78
[    0.560456] RDX: ffff88007fd15b78 RSI: ffff88007c4cd228 RDI: ffff88007d392bd0
[    0.560456] RBP: ffff88007d107cf0 R08: ffff88007d392be8 R09: 2f2f2f2f00000000
[    0.560456] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007d392bc0
[    0.560456] R13: 0000000000000001 R14: ffff88007c4e7580 R15: 0000000000000000
[    0.560456] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[    0.560456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.560456] CR2: 00007f70210be850 CR3: 000000007c57c000 CR4: 00000000001406b0
[    0.560456] Stack:
[    0.560456]  ffff88007d107d00 ffffffff81068437 ffff88007d107d80 ffffffff8106c2b6
[    0.560456]  ffff88007d107d80 ffff88007c4e7580 0000000000000000 01ff880000000002
[    0.560456]  0000000000000000 0000000000000011 0000000000000001 00000000215a4f1b
[    0.560456] Call Trace:
[    0.560456]  [<ffffffff81068437>] __enqueue_entity+0x67/0x70
[    0.560456]  [<ffffffff8106c2b6>] enqueue_entity+0x256/0xd30
[    0.560456]  [<ffffffff8106cde7>] enqueue_task_fair+0x57/0x9f0
[    0.560456]  [<ffffffff81066f6b>] ? sched_clock_cpu+0x4b/0xa0
[    0.560456]  [<ffffffff81061d45>] move_queued_task+0xd5/0x110
[    0.560456]  [<ffffffff81061f3d>] __migrate_task+0x2d/0x40
[    0.560456]  [<ffffffff81062786>] migration_cpu_stop+0x96/0xa0
[    0.560456]  [<ffffffff810626f0>] ? sched_ttwu_pending+0x80/0x80
[    0.560456]  [<ffffffff810b46d6>] cpu_stopper_thread+0x86/0x110
[    0.560456]  [<ffffffff8105ddd0>] ? sort_range+0x20/0x20
[    0.560456]  [<ffffffff8105ded5>] smpboot_thread_fn+0x105/0x160
[    0.560456]  [<ffffffff8105aa02>] kthread+0xd2/0xf0
[    0.560456]  [<ffffffff8105a930>] ? __kthread_create_on_node+0x140/0x140
[    0.560456]  [<ffffffff81595015>] ret_from_fork+0x25/0x30
[    0.560456] Code: 8b 08 48 89 0a 48 83 e1 fc 48 89 10 0f 84 ae 00 00 00 48 3b 41 10 0f 84 9e 00 00 00 48 89 51 08 5d c3 4c 8b 48 10 4d 85 c9 74 06 <41> f6 01 01 74 43 48 8b 51 10 48 39 fa 0f 84 98 00 00 00 48 85 
[    0.560456] RIP  [<ffffffff812a6169>] rb_insert_color+0xa9/0x190
[    0.560456]  RSP <ffff88007d107cf0>
[    0.560456] ---[ end trace 82bfbfca2222881f ]---