cormander / tpe-lkm

Trusted Path Execution (TPE) Linux Kernel Module
Other
157 stars 55 forks source link

Random kernel panics on boot #28

Closed NickJH closed 5 years ago

NickJH commented 6 years ago

Following a thread on the ElRepo mailing list, I am vetting intermittent kernel panics when running ClearOS 7.5 in a VBox VM on a Win10 host. More often than not the system boots correctly but if fails with a panic in about a quarter of boots. I've attached a screen dump of the crash.

virtualbox_clearos 7 x_29_06_2018_08_02_50

The distro can be downloaded from http://mirror.clearos.com/clearos/7/iso/x86_64/ClearOS-DVD-x86_64.iso (all versions are the same). This may give you a 7.4 installation as the 7.5 was only released to the update channel on Friday. You will need to install and select "Community" version (Community is 7.5, Home and Business will stay on 7.4 for a couple of weeks and the repo's act a little strangely during this period, especially while you are on a 30 day trial). You will probably have to register the system at https://www.clearcenter.com/. Then I suggest you do a "yum update" which may put you on 7.5 if the download didn't.

My compiled tpe is available from my server here: https://www.howitts.co.uk/clearos/ClearOS_7.x/kmod-tpe-2.0.3-6.20170731git.el7_5.elrepo.x86_64.rpm

If you needed to set up development stuff, instructions are at https://www.clearos.com/clearfoundation/development/clearos/content:en_us:dev_development_environment

If you have any questions, please ask. I understand you can set VBox to capture via a serial port but I don't know how.

[edit] AFAIK, the ClearOS kernel is an EL7 kernel with IMQ added for QoS so all kmod drivers need to be recompiled against the ClearOS kernel before they can be used. The drive I linked to has been recompiled so should be directly usable. [/edit]

cormander commented 6 years ago

Hi,

I got this installed in a Vbox VM on Windows10 host this evening. I was able to trigger this issue after a third reboot after the update to 7.5.

This is a kernel OOPS and this distro does a default panic_on_oops ... so what I did to get the full stack is to set:

kernel.panic_on_oops = 0

in

/etc/sysctl.d/tpe.conf

The system still fully boots if it runs into this issue, and you can find the relivant info in /var/log/messages - the stack trace is here:

Jul 10 01:20:57 localhost kernel: tpe: loading out-of-tree module taints kernel.
Jul 10 01:20:57 localhost kernel: tpe: module verification failed: signature and/or required key missing - tainting kernel
Jul 10 01:20:57 localhost kernel: tpe: added to kernel
Jul 10 01:20:57 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
Jul 10 01:20:57 localhost kernel: IP: [<ffffffffc021576c>] fopskit_security_mmap_file+0x2c/0x7f [tpe]
Jul 10 01:20:57 localhost kernel: PGD 0
Jul 10 01:20:57 localhost kernel: Oops: 0000 [#1] SMP
Jul 10 01:20:57 localhost kernel: Modules linked in: tpe(OE) ip_tables xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom crct10dif_generic ata_generic pata_acpi ahci ata_piix libahci libata e1000 crct10dif_pclmul crct10dif_common crc32c_intel serio_raw dm_mirror dm_region_hash dm_log dm_mod
Jul 10 01:20:57 localhost kernel: CPU: 0 PID: 487 Comm: journalctl Tainted: G           OE  ------------   3.10.0-862.6.3.v7.x86_64 #1
Jul 10 01:20:57 localhost kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Jul 10 01:20:57 localhost kernel: task: ffffa01cf76daf70 ti: ffffa019f68d8000 task.ti: ffffa019f68d8000
Jul 10 01:20:57 localhost kernel: RIP: 0010:[<ffffffffc021576c>]  [<ffffffffc021576c>] fopskit_security_mmap_file+0x2c/0x7f [tpe]
Jul 10 01:20:57 localhost kernel: RSP: 0018:ffffa019f68dbd60  EFLAGS: 00010202
Jul 10 01:20:57 localhost kernel: RAX: 0000000000000000 RBX: ffffffffc0217ec0 RCX: ffffa019f68dbda0
Jul 10 01:20:57 localhost kernel: RDX: 0000000000000004 RSI: ffffffff873b3d21 RDI: 0000000000000000
Jul 10 01:20:57 localhost kernel: RBP: ffffa019f68dbd90 R08: 0000000000000022 R09: 0000000000000000
Jul 10 01:20:57 localhost kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000008000
Jul 10 01:20:57 localhost kernel: R13: ffffffff874d1430 R14: ffffa019f68dbda0 R15: ffffffff873b3d21
Jul 10 01:20:57 localhost kernel: FS:  0000000000000000(0000) GS:ffffa01d09200000(0000) knlGS:0000000000000000
Jul 10 01:20:57 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 10 01:20:57 localhost kernel: CR2: 0000000000000020 CR3: 0000000036b94000 CR4: 00000000000606f0
Jul 10 01:20:57 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 10 01:20:57 localhost kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul 10 01:20:57 localhost kernel: Call Trace:
Jul 10 01:20:57 localhost kernel: [<ffffffff87352af4>] ? ftrace_ops_list_func+0xf4/0x120
Jul 10 01:20:57 localhost kernel: [<ffffffff87925264>] ftrace_regs_call+0x5/0x81
Jul 10 01:20:57 localhost kernel: [<ffffffff874d1435>] ? security_mmap_file+0x5/0xa0
Jul 10 01:20:57 localhost kernel: [<ffffffff874d1435>] ? security_mmap_file+0x5/0xa0
Jul 10 01:20:57 localhost kernel: [<ffffffff873b3d21>] ? vm_mmap_pgoff+0x61/0x120
Jul 10 01:20:57 localhost kernel: [<ffffffff873cc7c6>] SyS_mmap_pgoff+0x116/0x270
Jul 10 01:20:57 localhost kernel: [<ffffffff879216d5>] ? system_call_after_swapgs+0xa2/0x146
Jul 10 01:20:57 localhost kernel: [<ffffffff879216e1>] ? system_call_after_swapgs+0xae/0x146
Jul 10 01:20:57 localhost kernel: [<ffffffff879216d5>] ? system_call_after_swapgs+0xa2/0x146
Jul 10 01:20:57 localhost kernel: [<ffffffff879216e1>] ? system_call_after_swapgs+0xae/0x146
Jul 10 01:20:57 localhost kernel: [<ffffffff8722fc32>] SyS_mmap+0x22/0x30
Jul 10 01:20:57 localhost kernel: [<ffffffff87921795>] system_call_fastpath+0x1c/0x21
Jul 10 01:20:57 localhost kernel: [<ffffffff879216e1>] ? system_call_after_swapgs+0xae/0x146
Jul 10 01:20:57 localhost kernel: Code: 48 8b 04 25 80 0e 01 00 80 3d 10 2a 00 00 00 48 8b 80 70 06 00 00 48 8b 79 70 48 8b 15 06 2a 00 00 48 8b 40 70 74 0b 48 c1 ea 03 <48> 83 3c d0 00 75 0b 48 85 ff 74 06 f6 41 68 04 75 0a 66 90 c3
Jul 10 01:20:57 localhost kernel: RIP  [<ffffffffc021576c>] fopskit_security_mmap_file+0x2c/0x7f [tpe]
Jul 10 01:20:57 localhost kernel: RSP <ffffa019f68dbd60>
Jul 10 01:20:57 localhost kernel: CR2: 0000000000000020
Jul 10 01:20:57 localhost kernel: ---[ end trace 13733ed63d0b8df6 ]---

That's all the time I have for tonight - I'll continue taking a look at what's causing this over the next few days.

Thanks for the report.

NickJH commented 6 years ago

I am wondering if the problem has gone away with another kernel update to kernel-3.10.0-862.9.1.v7.x86_64. I have not been using my VM too much but it has not crashed since the update at the end of July. Today the kernel went up to 3.10.0-862.11.6.