Closed dbaxa closed 7 years ago
You'll need to narrow it down a lot more than this. I can't do the work for you, since I don't have access to a machine where it doesn't work. That means testing a vanilla kernel, narrowing down which release introduced the problem and narrowing down which commit introduces the problem. If you can provide logs, I can perhaps do something, but otherwise there's nothing to work with.
You'll need to confirm that this only happens with linux-hardened before it can be considered a bug.
You'll need to narrow it down a lot more than this. I can't do the work for you, since I don't have access to a machine where it doesn't work. That means testing a vanilla kernel, narrowing down which release introduced the problem and narrowing down which commit introduces the problem. If you can provide logs, I can perhaps do something, but otherwise there's nothing to work with.
@thestinger of course. I totally understand.
You'll need to confirm that this only happens with linux-hardened before it can be considered a bug.
Will do.
@thestinger The vanilla 4.11.2 kernel works well. Here is a diff of the kernel configuration:
diff -Nur hardened-config .config
--- hardened-config 2017-05-24 09:30:57.330645881 +1000
+++ .config 2017-05-23 14:23:34.909738303 +1000
@@ -254,9 +254,6 @@
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
-CONFIG_SLAB_CANARY=y
-CONFIG_SLAB_SANITIZE=y
-CONFIG_SLAB_SANITIZE_VERIFY=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
@@ -4985,8 +4982,6 @@
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_KEY_DH_OPERATIONS is not set
CONFIG_SECURITY_DMESG_RESTRICT=y
-CONFIG_SECURITY_TIOCSTI_RESTRICT=y
-CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
@@ -4997,9 +4992,6 @@
CONFIG_HAVE_ARCH_HARDENED_USERCOPY=y
CONFIG_HARDENED_USERCOPY=y
# CONFIG_HARDENED_USERCOPY_PAGESPAN is not set
-CONFIG_FORTIFY_SOURCE=y
-CONFIG_PAGE_SANITIZE=y
-CONFIG_PAGE_SANITIZE_VERIFY=y
# CONFIG_STATIC_USERMODEHELPER is not set
# CONFIG_SECURITY_SELINUX is not set
CONFIG_SECURITY_SMACK=y
I am going to try turning off various options and will let you know what option or set or options causes the issue.
Can you try with CONFIG_FORTIFY_SOURCE disabled?
Sure. Also, it seems that the version of gcc in use is 6.3.0
,
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 6.3.0-12ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170406 (Ubuntu 6.3.0-12ubuntu2)
Disabling CONFIG_FORTIFY_SOURCE
did not fix the issue.
Disabling CONFIG_PAGE_SANITIZE
, CONFIG_PAGE_SANITIZE_VERIFY
, CONFIG_SLAB_SANITIZE_VERIFY
and CONFIG_SLAB_SANITIZE_VERIFY
did not fix the issue either.
Also, it seems that the linux-hardned patch currently requires CONFIG_SLAB_CANARY=y
.
Do you mean that it doesn't currently compile without it?
It's probably not related to the boot issue though. Lots of the changes aren't tied to configuration options only ones with some reason like performance to disable them.
It can be built without CONFIG_SLAB_CANARY
in the 4.11 branch now. I doubt your issue is related to CONFIG_SLAB_CANARY
/ CONFIG_SLAB_HARDENED
though.
Do you mean that it doesn't currently compile without it?
Yep.
Hmm. This system uses dkms
to load bbswitch
maybe that is causing an issue.
You might want to try disabling PANIC_ON_OOPS
since one of the changes this tree adds is enabling that by default. It's possible you had a kernel oops before and didn't notice.
I thought there was a way to disable panic_on_oops via the kernel line but it doesn't appear that it's possible after all.
The issue is that when if I remove quiet
, splash
and vt.handoff
I am not seeing an OOPS
.
If you're building with the default PANIC_ON_OOPS
it might panic before it was able to show you anything. You should check in dmesg
on a vanilla kernel and try with PANIC_ON_OOPS
disabled with linux-hardened.
I already had PANIC_ON_OOPS
disabled and didn't see anything. I'll check dmesg
on a vanilla kernel.
Is this on real hardware or a virtual machine? Can you give some details on that?
This is on a real machine. The machine is a Dell XPS 15 2014 model. Here is an old dmesg
from the machine https://bugzilla.kernel.org/attachment.cgi?id=190581. It has an integrated intel graphic card that I use and a GeForce GT 750M
.
BTW I tagged 4.11.2.c which should let you disable CONFIG_SLAB_CANARY
and CONFIG_SLAB_HARDENED
to get closer to vanilla.
Great. I'll test out 4.11.2.c
.
After disabling more hardened related options the system boots a 4.11.2.c kernel.
--- config/hardened-config 2017-05-23 13:35:07.576541812 +1000
+++ current/linux-4.11.2/.config 2017-05-26 08:07:41.045180527 +1000
@@ -254,9 +254,8 @@
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
-CONFIG_SLAB_CANARY=y
+# CONFIG_SLAB_CANARY is not set
CONFIG_SLAB_SANITIZE=y
-CONFIG_SLAB_SANITIZE_VERIFY=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SYSTEM_DATA_VERIFICATION=y
CONFIG_PROFILING=y
@@ -306,7 +305,7 @@
CONFIG_HAVE_GCC_PLUGINS=y
CONFIG_GCC_PLUGINS=y
# CONFIG_GCC_PLUGIN_CYC_COMPLEXITY is not set
-CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y
+# CONFIG_GCC_PLUGIN_LATENT_ENTROPY is not set
# CONFIG_GCC_PLUGIN_STRUCTLEAK is not set
CONFIG_HAVE_CC_STACKPROTECTOR=y
CONFIG_CC_STACKPROTECTOR=y
@@ -4948,10 +4947,10 @@
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK_DBGP=y
CONFIG_EARLY_PRINTK_EFI=y
-# CONFIG_X86_PTDUMP_CORE is not set
+CONFIG_X86_PTDUMP_CORE=y
# CONFIG_X86_PTDUMP is not set
# CONFIG_EFI_PGT_DUMP is not set
-# CONFIG_DEBUG_WX is not set
+CONFIG_DEBUG_WX=y
CONFIG_DOUBLEFAULT=y
# CONFIG_DEBUG_TLBFLUSH is not set
# CONFIG_IOMMU_DEBUG is not set
@@ -4985,7 +4984,7 @@
CONFIG_ENCRYPTED_KEYS=y
# CONFIG_KEY_DH_OPERATIONS is not set
CONFIG_SECURITY_DMESG_RESTRICT=y
-CONFIG_SECURITY_TIOCSTI_RESTRICT=y
+# CONFIG_SECURITY_TIOCSTI_RESTRICT is not set
CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
@@ -4995,11 +4994,9 @@
CONFIG_INTEL_TXT=y
CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
CONFIG_HAVE_ARCH_HARDENED_USERCOPY=y
-CONFIG_HARDENED_USERCOPY=y
-# CONFIG_HARDENED_USERCOPY_PAGESPAN is not set
-CONFIG_FORTIFY_SOURCE=y
-CONFIG_PAGE_SANITIZE=y
-CONFIG_PAGE_SANITIZE_VERIFY=y
+# CONFIG_HARDENED_USERCOPY is not set
+# CONFIG_FORTIFY_SOURCE is not set
+# CONFIG_PAGE_SANITIZE is not set
# CONFIG_STATIC_USERMODEHELPER is not set
# CONFIG_SECURITY_SELINUX is not set
CONFIG_SECURITY_SMACK=y
Can you narrow it down to a specific option? You're most of the way there already.
Current release is 4.11.2.a, so it should be narrowed down to a configuration option there (assuming disabling everything works, which means one of the configuration options controls the problematic feature).
Of course. Will do.
I am yet to find the option but the following options don't seem to trigger the issue:
CONFIG_SECURITY_TIOCSTI_RESTRICT
CONFIG_GCC_PLUGIN_LATENT_ENTROPY
BTW CONFIG_GCC_PLUGIN_LATENT_ENTROPY and HARDENED_USERCOPY are really upstream options although some extensions are made to both in linux-hardened.
Enabling SLAB_CANARY
causes the system to fail to boot.
Interesting, I wonder if it's catching something or if there's a bug in an edge case code path. Can you try enabling slub_debug=FZ
on your kernel line in a build without SLAB_CANARY
? That enables some similar debugging checks that are upstream, and you can check the kernel log to see if they're making any noise.
Will do.
I can make a patch causing it to warn instead of trigger a kernel oops as another approach. Unfortunately I can't debug it myself without a way to reproduce the issue and so far it doesn't seem other people have run into it.
I can make a patch causing it to warn instead of trigger a kernel oops as another approach.
Sounds good to me. Especially since I don't see anything odd in my kernel logs after booting with slub_debug=FZ
.
Thank you for your patience with my bug report.
FYI:
When building the kernel without SLAB_CANARY
I get the following:
CC mm/slub.o
mm/slub.c: In function ‘kmem_cache_alloc_bulk’:
mm/slub.c:3222:9: warning: unused variable ‘k’ [-Wunused-variable]
int i, k;
^
@nmatt0 https://github.com/copperhead/linux-hardened/commit/786cb0888040dabe404a33039248ee6ff4407169 should fix that warning.
@dbaxa If you have time to try something else, you can change BUG_ON in this line in mm/slub.c to WARN_ON:
BUG_ON(*canary != get_canary_value(canary, value));
So, to this:
WARN_ON(*canary != get_canary_value(canary, value));
@thestinger thank you for the pointer. I'll do that.
I changed BUG_ON
to WARN_ON
as you suggested but that didn't seem to help. However, disabling CONFIG_SLAB_SANITIZE
(I also change CONFIG_SLUB_DEBUG_ON
to y
) did result in the system booting and working fine - using the 4.11.6.d
patch.
@thestinger I was able to reproduce the failure to boot on another older dell laptop which doesn't have an nvidia card. Also, this laptop doesn't have the nvidia driver installed nor is the CONFIG_DRM_NOUVEAU option enabled.
There might be multiple issues. I can't do much without a traceback though. If it doesn't work without BUG_ON changed to WARN_ON, then it sounds like there's another problem, but I don't really have anywhere to start on that.
@thestinger okay I'll just keep changing BUG_ON
to WARN_ON
till something sticks :-) .
@dbaxa So that laptop works with CONFIG_SLAB_CANARY disabled and not with it enabled - no changes to other options? I really need you to try to get logs.
The traceback they provided appears to be an unrelated issue after all.
@thestinger the laptop works with CONFIG_SLAB_CANARY enabled when I disabled CONFIG_SLAB_SANITIZE
and set CONFIG_SLUB_DEBUG_ON
to y
.
I am going to close this issue for now as my system can boot with a 4.11.6 patched kernel without issue with CONFIG_SLAB_SANITIZE
and CONFIG_SLAB_CANARY
enabled. For the record I re-enabled CONFIG_FORTIFY_SOURCE
and have left CONFIG_SLUB_DEBUG_ON
as y. (I am yet to re-enable CONFIG_PAGE_SANITIZE
and CONFIG_PAGE_SANITIZE_VERIFY
).
It could just be because CONFIG_SLUB_DEBUG_ON
ends up enabling the upstream debug-oriented poisoning and disabling the security-oriented slub sanitization added by linux-hardened.
You probably don't want CONFIG_SLUB_DEBUG_ON
for a hardened kernel, but I thought it might uncover an issue you were hitting.
It could just be because CONFIG_SLUB_DEBUG_ON ends up enabling the upstream debug-oriented poisoning and disabling the security-oriented slub sanitization added by linux-hardened.
Okay. I'll disable CONFIG_SLUB_DEBUG_ON
and see what happens.
If it does break, I think you should open up a new issue, because it's a lot clearer what's going on now and the rest of the thread just confuses things.
@dbaxa The problem is that you weren't enabling SLAB_HARDENED
, which should have been a dependency of SLAB_CANARY
. I've corrected the issue in the 4.12 branch and it will be in the next 4.12 release.
Ubuntu 17.04 does not boot when applying either the 4.11.2.a or the 4.11.2.b patch and building a 4.11.2 kernel. Boot seems to stop just after "loading initramfs". Please let me know what I can do to provide additional details. Note: I am yet to try a vanilla 4.11.2 kernel.