Closed lambdaupb closed 3 years ago
/*
* The MAP_RESILIENT_* flags can be used when the caller wants to map some
* possibly unreliable memory and be able to access it safely, possibly
* getting the wrong contents rather than raising any exception.
* For safety reasons, such mappings have to be read-only (PROT_READ access
* only).
*
* MAP_RESILIENT_CODESIGN:
* accessing this mapping will not generate code-signing violations,
* even if the contents are tainted.
* MAP_RESILIENT_MEDIA:
* accessing this mapping will not generate an exception if the contents
* are not available (unreachable removable or remote media, access beyond
* end-of-file, ...). Missing contents will be replaced with zeroes.
*/
#define MAP_RESILIENT_CODESIGN 0x2000 /* no code-signing failures */
#define MAP_RESILIENT_MEDIA 0x4000 /* no backing-store failures */
Seems that only works for read only mappings.
That's very interesting, but I believe we cannot quite remap things here. Instead we should adjust the codesign flags as we already do, but perhaps in a slightly different manner. It may be possible that I missed some for the latest 10.15 version. Could you play with it and try setting/dropping different flags?
CC @usr-sse2 @osy86 @lvs1974 @07151129
Can easily reproduce on 10.14.6 here: run P95 large FFTs until some swapping occurs, and then try to open About This Mac. This should cause WindowServer to crash.
sudo sysctl vm.cs_debug=255
adds some more info:
2020-12-11 19:35:59.509 Df kernel[0:1f4918] vm_fault: signed: no validate: no tainted: no wpmapped: no prot: 0x5
2020-12-11 19:35:59.509 Df kernel[0:1f4918] CODE SIGNING: cs_invalid_page(0x7fff3ad17000): p=38037[WindowServer]
2020-12-11 19:35:59.509 Df kernel[0:1f4918] CODE SIGNING: cs_invalid_page(0x7fff3ad17000): p=38037[WindowServer] final status 0x23007b01, denying page sending SIGKILL
2020-12-11 19:35:59.509 Df kernel[0:1f4918] CODE SIGNING: process 38037[WindowServer]: rejecting invalid page at address 0x7fff3ad17000 from offset 0xb89e000 in file "/private/var/db/dyld/dyld_shared_cache_x86_64h" (cs_mtime:1605723499.64038983 == mtime:1605723499.64038983) (signed:0 validated:0 tainted:0 nx:0 wpmapped:0 dirty:1 depth:2)
2020-12-11 19:35:59.509 Df kernel[0:1f4918] CODESIGNING: vm_fault_enter(0x7fff3ad17000): *** INVALID PAGE ***
sending SIGKILL
means that CS_KILL
was set (note that cs_invalid_page
hasn't changed in 10.15).
@al3xtjames: try to add a boot-arg -liluuseroff.
@al3xtjames @lambdaupb could you check whether the offset found by UserPatcher::vmProtect
is correct? Because it clearly strips CS_KILL
from the process.
I'm not a C programmer and have no real Idea how to do that. If I'm provided with step-by-step instruction, I can repro this though.
This machine is my daily driver at the moment so I'm reluctant to dive into it since my issue was solved by removing the enable-hdmi20 setting.
The easiest test is to enable Lilu debug logging and create a debug log in /var/log/Lilu_x.x.x.txt via -liludbgall liludump=60
boot arguments. Upload it here, and perhaps it sheds some light on the issue.
Lilu is using 308 as the offset for p_csflags
.
Lilu_1.5.1_18.7.txt
@al3xtjames thx a lot for the CoreDisplay fix on weg. Would you mind providing some more information about max-pixel-clock-frequency value? If you have time to update Manual in weg, then will be so nice.
I tried to reproduce on my NUC but couldn't. System becomes laggy but not unresponsive and it doesn't crash or even overheat. CPU usage went up and down, I guess thats part of the Large FFT torture test? I left it running for about 10 minutes whilst browsing Github and opening/closing the about my Mac dialog every now and then. My config can be found here.
As I mentioned here I believe these forced logouts on NUC 8th gens are due to missing ACPI patches and/or the OpenCore configuration used. But thats just my guess since I have no issues and run multiple NUCs. I have stress tested them with stress-ng quite heavily a few months ago. No problems whatsoever, these Kaby Lake NUCs are rock solid with OpenCore for me.
I'm running the latest versions of OpenCore/Lilu/etc and compiling everything from source now but also had no problems when I didn't do that and just used the release versions. Are there any other ways for me to try and reproduce this?
@zearp thank you for your attempt at reproducing this issue!
I think you have SIP disabled with
<key>csr-active-config</key>
<data>/wcAAA==</data>
where /wcAAA==
b64 is equal to ff 07 00 00
hex. Which according to Dorthania
https://dortania.github.io/OpenCore-Install-Guide/troubleshooting/extended/post-issues.html#disabling-sip
disables all SIP on Mojave / Catalina.
So code signing would be disabled and not kill WindowServer.
@lambdaupb Good point! I have it disabled cuz I use VoltageShift. I just repeated the test with SIP enabled. It did run a little hotter but after ~10 minutes of running Prime95 and opening about this Mac and Launchpad/Notification Centre a bunch of times I didn't get any crash. The fading animation varies from smooth to choppy but nothing grinds to a halt.
I'm thinking that the logouts people experienced on the NUC may have nothing to do with this, which is why I can't reproduce. Unless it also happens to you on a NUC but it seems you're using a different mini computer. I'm only here cuz you mentioned this in a NUC issue I was still subscribed to haha. But I can't seem to reproduce it on my NUCs.
@zearp I have little experience with that setting, but could you check if SIP is really disabled enabled? The dorthania guide mentions it will not overwrite old values in NVRAM unless the property is mentioned in the delete section as well.
Note: Disabling SIP with OpenCore is quite a bit different compared to Clover, specifically that NVRAM variables will not be overwritten unless explicitly told so under the Delete section. So if you've already set SIP once either via OpenCore or in macOS, you must override the variable:
NVRAM -> Block -> 7C436110-AB2A-4BBB-A880-FE41995C9F82 -> csr-active-config
@lambdaupb Yes it was really enabled. I checked with csrutil status
after rebooting and reset NVRAM in between boots for good measure. I was also prompted with a bunch of security warnings, those are due voltageShift, Intel Power Gadget and some other kexts I use. So my guess its that it's really turned on. Does this happen to you on a Kaby Lake NUC too or only on your DeskMini?
My deskmini has a Coffee Lake R (I think) i5-8500 CPU.
There might be something else going on as well. The crash report of WindowServer clearly shows a code signing crash on the NUC
https://github.com/appleserial/NUC8I5BEH/issues/13
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (Code Signature Invalid)
Exception Codes: 0x0000000000000032, 0x00007fff37028253
Exception Note: EXC_CORPSE_NOTIFY
Termination Reason: Namespace CODESIGNING, Code 0x2
kernel messages:
VM Regions Near 0x7fff37028253:
__TEXT 00007fff37009000-00007fff37028000 [ 124K] r-x/r-x SM=COW /System/Library/Frameworks/CoreDisplay.framework/Versions/A/CoreDisplay
--> __TEXT 00007fff37028000-00007fff37029000 [ 4K] r-x/rwx SM=COW /System/Library/Frameworks/CoreDisplay.framework/Versions/A/CoreDisplay
Submap 00007fff37029000-00007fff40000000 [143.8M] r--/rwx SM=PRV process-only VM submap
So the issue exists and is fixed by removing enable-hdmi20 for me on 10.15 and @al3xtjames on 10.14.
It might very well be a combination with another setting or ACPI patch that triggers it though.
It might very well be a combination with another setting or ACPI patch that triggers it though.
@lambdaupb Yeah thats my guess too. What I will do is try the EFI from the repo you linked and report back in a bit. When I wrote Kaby Lake I meant Coffee Lake of course. I'm a pro at messing up those Intel codenames, sorry for any confusion it may have caused.
Thanks for the help. I will try to reproduce this issue with opencore updated to 0.6.4 and all other modules updated as well.
Let me be clear:
@lambdaupb Just ran the same tests using the EFI from the repo you linked and again no crashes, SIP is enabled and the hdmi setting too. I'm thinking these random logouts people experienced on the NUC have nothing to do with this issue, which would explain my failure to reproduce it. But it doesn't mean there is no issue of course. I don't have a DeskMini 310 to play with but it looks like a fun little machine so I hope you can get this sorted.
The issue with the WindowServer crash you linked seems to be solved by a comment on a blog thats linked but I can't read the comment because the comments are not loading for me for some reason. I've not done any upgrading from 10.14.x to 10.15.x and only ever used Catalina and Big Sur on my NUCs. Maybe those crashes were related to the upgrade or something else in their setup? I think this specific issue isn't present on the NUC Coffee Lake models but do let me know if there's anything else I can try.
@lambdaupb Just ran the same tests using the EFI from the repo you linked and again no crashes, SIP is enabled and the hdmi setting too. I'm thinking these random logouts people experienced on the NUC have nothing to do with this issue, which would explain my failure to reproduce it. But it doesn't mean there is no issue of course. I don't have a DeskMini 310 to play with but it looks like a fun little machine so I hope you can get this sorted.
The issue with the WindowServer crash you linked seems to be solved by a comment on a blog thats linked but I can't read the comment because the comments are not loading for me for some reason. I've not done any upgrading from 10.14.x to 10.15.x and only ever used Catalina and Big Sur on my NUCs. Maybe those crashes were related to the upgrade or something else in their setup? I think this specific issue isn't present on the NUC Coffee Lake models but do let me know if there's anything else I can try.
@zearp Hi, I can reproduce WindowServer crash with your EFI and https://github.com/appleserial/NUC8I5BEH 's EFI by running "Large FFTs". And my NUC is upgraded from 10.14 . Can you post the blog link? Thank you.
@likaci You can’t follow the link I referred to and find the blog post yourself? Please don't quote an entire post to only add a sentence.
Try if you can also reproduce it on a system that wasn’t upgraded from 10.14.x because no matter how long I let it run I get no crashes and I directly installed Catalina on mine.
I don’t have a 10.14.x installer laying around to do a clean install with and then upgrade to Catalina but I might try for the fun of it and see if I get crashes that way.
@zearp Sorry for my disturbing and bad english. I have read the entire page but can't find the link that mentioned about upgrad from 10.14 may cause the problem.
I have only one NUC running some services , so I can't reinstall it. I confirmed that Disable SIP or Disable HDMI2.0 can void the problem.
Thank you for your help, Happy new year.
I also had this problem in Big Sur. In the Skylake laptop, only the freq of 1.5ghz or more was maintained, and the overheating phenomenon was constantly maintained, leading to poor performance. It was resolved by turning off the enable-hdmi20 option. thank you for tip!
enable-hdmi20
is deprecated in favour of max-pixel-clock
feature (https://github.com/acidanthera/WhateverGreen/pull/79). Although the issue is not exclusive to CDF side of WEG, userspace patching is implemented differently on Big Sur and above, and is not affected by this issue. I no longer use Catalina or older, and thus decided not to address this issue. Closing.
Does this mean that enable-max-pixel-clock-override
replaces the enable-hdmi20
option? Will the option stay or will it be removed in future builds?
Because at the moment removing enable-hdmi20
and replacing it with enable-max-pixel-clock-override
breaks 4k on Catalina and earlier.
It seems its not doing the same as the hdmi20 option did. But I may have misunderstood and/or not implemented it properly.
You may need higher max-pixel-clock-frequency (in Hz, defaults to 675000000). https://github.com/acidanthera/WhateverGreen/blob/master/Manual/FAQ.IntelHD.en.md#hdmi-in-uhd-resolution-with-60fps
see https://github.com/csrutil/DeskMini/issues/10
DeskMini 310, i5-8500 UHD630, Catalina 10.15.7, Opencore 0.6.3
related code (probably): https://github.com/acidanthera/WhateverGreen/blob/7d30dd8a624d0d3b2d4882fcc689b9db4964efd5/WhateverGreen/kern_cdf.cpp#L182
enable-hdmi20
patches CoreDisplay at runtime. When in a High Memory Pressure situation it apparently happens that theCoreDisplay
library memory is moved to swap.When reloading the library memory to RAM, a code signing check is done and fails, causing a WindowServer crash.
I am able to reproduce this by using
Prime95 > Torture Test > Large FFTs
which allocates almost all of system memory and then doing some UI stuff involving animations etc (~1min).Possible fixes
logs