RedPill-TTG / redpill-lkm

Linux kernel module for RedPill
GNU General Public License v3.0
307 stars 174 forks source link

execve() shim unreliable on some platforms #3

Closed ttg-public closed 2 years ago

ttg-public commented 2 years ago

Users are reporting unable to boot scenarios on some platforms which can be linked to execve() shim. It's a possibly a memory leak or an issue with comparing kernel memory with user memory:

Stack traces:

[    7.001987] BUG: unable to handle kernel paging request at 0000000000fcdaa8
[    7.002545] IP: [<ffffffff812c3bd2>] strcmp+0x12/0x30
[    7.002925] PGD 26d58e067 PUD 270fec067 PMD 274f25067 PTE 800000027fbfa067
[    7.002925] Oops: 0001 [#166] PREEMPT SMP
[    7.002925] Modules linked in: redpill(OE)
[    7.002925] CPU: 3 PID: 3565 Comm: ash Tainted: G      D    OE   4.4.59+ #25556
[    7.002925] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[    7.002925] task: ffff88026ccd6040 ti: ffff880273490000 task.ti: ffff880273490000
[    7.002925] RIP: 0010:[<ffffffff812c3bd2>]  [<ffffffff812c3bd2>] strcmp+0x12/0x30
[    7.002925] RSP: 0018:ffff880273493f18  EFLAGS: 00010282
[    7.002925] RAX: ffffffffa00008e0 RBX: ffffffffa0009aa0 RCX: 000000000000058b
[    7.002925] RDX: 0000000000fcda48 RSI: ffff880274e94641 RDI: 0000000000fcdaa9
[    7.002925] RBP: ffff880273493f18 R08: 0000000000000000 R09: 688c3cb33593c6ff
[    7.002925] R10: 000000000000058b R11: 0000000000000206 R12: 0000000000fcdaa8
[    7.002925] R13: 0000000000fcda28 R14: 0000000000fcda48 R15: 0000000000000000
[    7.002925] FS:  00007feb66f65700(0000) GS:ffff88027fd80000(0000) knlGS:0000000000000000
[    7.002925] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.002925] CR2: 0000000000fcdaa8 CR3: 0000000271d8c000 CR4: 00000000003606f0
[    7.002925] Stack:
[    7.002925]  ffff880273493f48 ffffffffa000090c 0000000000fcdaa8 0000000000fcda28
[    7.002925]  0000000000000000 0000000000fcda48 0000000000fcda48 ffffffff81578b8a
[    7.002925]  00000000fc2c9fc5 00007feb6676e070 0000000000000001 00007feb66f679e0
[    7.002925] Call Trace:
[    7.002925]  [<ffffffffa000090c>] shim_sys_execve+0x2c/0x90 [redpill]
[    7.002925]  [<ffffffff81578b8a>] entry_SYSCALL_64_fastpath+0x1e/0x92
[    7.002925] Code: f7 48 8d 76 01 48 8d 52 01 0f b6 4e ff 84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 8d 7f 01 48 8d 76 01 <0f> b6 47 ff 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31 c0 5d c3 66
[    7.002925] RIP  [<ffffffff812c3bd2>] strcmp+0x12/0x30
[    7.002925]  RSP <ffff880273493f18>
[    7.002925] CR2: 0000000000fcdaa8
[    7.002925] ---[ end trace 354c1394de4cdfde ]---
[    3.517024] BUG: unable to handle kernel paging request at 0000000001f4b498
[    3.517026] IP: [<ffffffff8127b794>] strcmp+0x4/0x30
[    3.517028] PGD 3a879067 PUD 3aed6067 PMD 39e17067 PTE 8000000001bf7067
[    3.517029] Oops: 0001 [#31] SMP 
[    3.517031] Modules linked in: redpill(OF)
[    3.517032] CPU: 2 PID: 4913 Comm: ash Tainted: GF     D    O 3.10.105 #25556
[    3.517033] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[    3.517034] task: ffff88003dd79040 ti: ffff88003ad94000 task.ti: ffff88003ad94000
[    3.517036] RIP: 0010:[<ffffffff8127b794>]  [<ffffffff8127b794>] strcmp+0x4/0x30
[    3.517037] RSP: 0018:ffff88003ad97f28  EFLAGS: 00010206
[    3.517037] RAX: ffffffffa0000c10 RBX: ffffffffa0009500 RCX: 000000000000058b
[    3.517038] RDX: 0000000001f554e8 RSI: ffff880038246b60 RDI: 0000000001f4b499
[    3.517039] RBP: 0000000001f4b498 R08: 0000000000000000 R09: 0000000000000066
[    3.517039] R10: 000000000000058b R11: 0000000000000202 R12: 0000000001f4b510
[    3.517040] R13: 0000000001f554e8 R14: 0000000000000000 R15: 0000000000000000
[    3.517041] FS:  00007fce85811700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
[    3.517042] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.517042] CR2: 0000000001f4b498 CR3: 0000000039e6e000 CR4: 00000000003607e0
[    3.517043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.517044] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    3.517045] Stack:
[    3.517046]  ffffffffa0000c36 0000000000000000 0000000000000000 0000000000000000
[    3.517048]  0000000000000000 ffffffff814cfdc4 0000000000000000 0000000001f554e8
[    3.517049]  0000000000000000 0000000001f4b510 0000000001f554e8 0000000001f4b498
[    3.517050] Call Trace:
[    3.517052]  [<ffffffffa0000c36>] ? shim_sys_execve+0x26/0x90 [redpill]
[    3.517054]  [<ffffffff814cfdc4>] ? system_call_fastpath+0x22/0x27
[    3.517062] Code: 0f 1f 80 00 00 00 00 48 83 c6 01 0f b6 4e ff 48 83 c2 01 84 c9 88 4a ff 75 ed f3 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 75 0f 84 c0 75 eb 31 c0 c3 0f 
[    3.517064] RIP  [<ffffffff8127b794>] strcmp+0x4/0x30
[    3.517065]  RSP <ffff88003ad97f28>
[    3.517065] CR2: 0000000001f4b498
[    3.517066] ---[ end trace 1310bcd7b39d15af ]---

Reports:

lazosweb commented 2 years ago

Gadreel here... just to let you know that on Apollolake when ttsy reinit was set to true the loader was not crashing there. Now that you set it to false both Bromolow and Apollolake crash no matter if it's 6.4 or 7.0 same result...

ttg-public commented 2 years ago

It's frustrating that we cannot replicate this reliably but we may have found a fix. It's implemented in https://github.com/RedPill-TTG/redpill-lkm/commit/dffcf018cef3fe847c6d5891525526ba26cbb0b3

Can you try @lazosweb ?

semool commented 2 years ago

Its working fine here now with CPU set to Host in my Proxmox Machine.

Scoobdriver commented 2 years ago

sorry. Still an issue for me on esxi 6.7. Edit. my mistake, now able build on esxi 6.7

ttg-public commented 2 years ago

sorry. Still an issue for me on esxi 6.7.

Hmm, on one of our ESXi 6.7 systems it does work flawlessly. Are you triple sure you're running the newest version of the kernel module? Check if you get any output by typing dmesg | grep 'RedPill v' - it should show which exact commit the LKM is running.

lazosweb commented 2 years ago

Gadreel here, it's working fine on my end. I tried Apollolake 6.2.4 (success), Bromolow 7.0 (success) and Apollolake 7.0 (unrelated issue). I have an issue with Apollolake 7.0 and could not finish installing it but I doubt it's related to this bug. Great work @ttg-public

Scoobdriver commented 2 years ago

sorry. Still an issue for me on esxi 6.7.

Hmm, on one of our ESXi 6.7 systems it does work flawlessly. Are you triple sure you're running the newest version of the kernel module? Check if you get any output by typing dmesg | grep 'RedPill v' - it should show which exact commit the LKM is running.

Apologies had copied the wrong .img file .

https://github.com/RedPill-TTG/redpill-lkm/commit/dffcf018cef3fe847c6d5891525526ba26cbb0b3 has resolved this issue for me .

ttg-public commented 2 years ago

The https://github.com/RedPill-TTG/redpill-lkm/commit/dffcf018cef3fe847c6d5891525526ba26cbb0b3 and https://github.com/RedPill-TTG/redpill-lkm/commit/140250a solved the issue.