Open Feandil opened 11 years ago
On Mon, Jan 14, 2013 at 07:33:23AM -0800, Feandil wrote:
Using the included tests (make test) crashes my system. It's a remove server, thus I can't really know the panic cause. I'm not exactly using the standard 'make test' as I'm running under SELinux: all the changes I made can be found on https://github.com/Feandil/lerya.net-overlay/tree/master/app-admin/mcelog (one sed inside the ebuild and various patchs in the files directory)
Hmm I'm not sure how to distribute those. I would add some generic selinux config files, but I don't think I want gentoo specific files.
Is it normal ? Is there a way to have crash-free tests ?
The kernel should not crash, may be some kind of kernel regression.
Here are the last logs before the crash:
Jan 14 16:25:26 lerya kernel: [ 1202.285787] soft_offline: 0x1bc3: unknown non LRU page type 100000000000400 Jan 14 16:25:26 lerya kernel: [ 1202.286006] MCE 0x1bc3: non LRU page recovery: Ignored ++++++++++++ running memdb test +++++++++++++++++++ Please delete /tmp/tmp.XdYY3kQOia after you checked /tmp/tmp.XdYY3kQOia/*.log /tmp/tmp.XdYY3kQOia/return Jan 14 16:25:27 lerya kernel: [ 1203.476569] type=1400 audit(1358177127.877:195): avc: denied { getsched } for pid=3660 comm="mce-inject" ipaddr=82.67.68.201 scontext=staff_u:sysadm_r:mcelog_inject_t tcontext=staff_u:sysadm_r:mcelog_inject_t tclass=process
It looks like selinux or similiar prevents the MCE injection.
Jan 14 16:25:27 lerya kernel: [ 1203.476600] Starting machine check poll CPU 0 Jan 14 16:25:27 lerya kernel: [ 1203.476611] Machine check poll done on CPU 0
I would need a log of the crash. Can you set up netconsole or somesuch and collect it?
-Andi
ak@linux.intel.com -- Speaking for myself only.
Thanks for you quick response :)
For SELinux files, the default files in Gentoo already provides some functionnalities (not sure if/when they will be pushed upstream). Mine (supporting mce-inject) are in https://github.com/Feandil/lerya.net-overlay/tree/master/sec-policy/selinux-mcelog/files and except perhaps some particular interface, those should be portable to other distributions (I will probably try to push those in the Gentoo SELinux system, hoping that they will go upstream someday).
I fixed the SELinux denies by adding the getsched/setsched rights: no more selinux-related denies
I tried to log the actual kernel panic using netconsole but that doesn't give a lot of information (I used netconsole=...@.../...,...@.../... debug ignore_loglevel): https://gist.github.com/6ef6352d4556416549dd
Using the included tests (make test) crashes my system. It's a remove server, thus I can't really know the panic cause. I'm not exactly using the standard 'make test' as I'm running under SELinux: all the changes I made can be found on https://github.com/Feandil/lerya.net-overlay/tree/master/app-admin/mcelog (one sed inside the ebuild and various patchs in the files directory)
Is it normal ? Is there a way to have crash-free tests ?
Here are the last logs before the crash:
Jan 14 16:25:26 lerya kernel: [ 1202.285787] soft_offline: 0x1bc3: unknown non LRU page type 100000000000400 Jan 14 16:25:26 lerya kernel: [ 1202.286006] MCE 0x1bc3: non LRU page recovery: Ignored ++++++++++++ running memdb test +++++++++++++++++++ Please delete /tmp/tmp.XdYY3kQOia after you checked /tmp/tmp.XdYY3kQOia/*.log /tmp/tmp.XdYY3kQOia/return Jan 14 16:25:27 lerya kernel: [ 1203.476569] type=1400 audit(1358177127.877:195): avc: denied { getsched } for pid=3660 comm="mce-inject" ipaddr=82.67.68.201 scontext=staff_u:sysadm_r:mcelog_inject_t tcontext=staff_u:sysadm_r:mcelog_inject_t tclass=process Jan 14 16:25:27 lerya kernel: [ 1203.476600] Starting machine check poll CPU 0
Jan 14 16:25:27 lerya kernel: [ 1203.476611] Machine check poll done on CPU 0
Processor information (/proc/cpuinfo): processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Celeron(R) CPU 2.66GHz stepping : 9 microcode : 0x3 cpu MHz : 2659.972 cache size : 256 KB fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts nopl pni dtes64 monitor ds_cpl tm2 cid cx16 xtpr lahf_lm bogomips : 5319.94 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: