Open dumkin opened 9 months ago
was tested on x86 and Arm MacOS, everywhere it crash when using .net 8
Please take a look, thank you! @dotnet/ncl , @kotlarmilos
I can reproduce this against net8.0, but not main
.
Native stack:
frame #0: 0x000000018c3020dc libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x000000018c339cc0 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x000000018c245a40 libsystem_c.dylib`abort + 180
frame #3: 0x000000018c15cb08 libsystem_malloc.dylib`malloc_vreport + 908
frame #4: 0x000000018c17c24c libsystem_malloc.dylib`malloc_zone_error + 104
frame #5: 0x000000018c17b0a8 libsystem_malloc.dylib`free_tiny_botch + 40
frame #6: 0x000000019cc33b1c GSS`_gss_scram_release_cred + 176
frame #7: 0x000000019cbf0dec GSS`_gss_mg_release_cred + 124
frame #8: 0x000000018c4e9b90 CoreFoundation`_CFRelease + 292
frame #9: 0x000000019cbf0d48 GSS`gss_release_cred + 76
The managed stack
0000000171005C10 0000000105b0cd28 (MethodDesc 0000000105f3cb50 + 0x98 Microsoft.Win32.SafeHandles.SafeGssCredHandle.ReleaseHandle())
0000000171005CB0 051880019cbf0d48 051880019cbf0d48, calling 051880019cc37d00
0000000171005CE0 703a000105b0cd28 703a000105b0cd28
0000000171005CF0 0000000101f2bfc4 (MethodDesc 0000000102ba3038 + 0x114 System.Runtime.InteropServices.SafeHandle.InternalRelease(Boolean))
0000000171005D30 0000000105b0cd0c (MethodDesc 0000000105f3cb50 + 0x7c Microsoft.Win32.SafeHandles.SafeGssCredHandle.ReleaseHandle())
0000000171005DE0 0000000101f2bdbc (MethodDesc 0000000102ba2fc0 + 0x2c System.Runtime.InteropServices.SafeHandle.Dispose())
0000000171005E20 0000000105b103f8 (MethodDesc 0000000105f3b1e8 + 0x28 System.Net.NegotiateAuthenticationPal+UnixNegotiateAuthenticationPal.Dispose())
0000000171005E40 0000000105b1069c (MethodDesc 0000000105f3b1f8 + 0x25c System.Net.NegotiateAuthenticationPal+UnixNegotiateAuthenticationPal.GetOutgoingBlob(System.ReadOnlySpan`1<Byte>, System.Net.Security.NegotiateAuthenticationStatusCode ByRef))
0000000171005E80 0000000105b1bf4c (MethodDesc 0000000105cfac30 + 0x3c System.Net.Security.NegotiateAuthentication.GetOutgoingBlob(System.ReadOnlySpan`1<Byte>, System.Net.Security.NegotiateAuthenticationStatusCode ByRef))
0000000171005ED0 0000000105b1c0e4 (MethodDesc 0000000105cfac48 + 0x84 System.Net.Security.NegotiateAuthentication.GetOutgoingBlob(System.String, System.Net.Security.NegotiateAuthenticationStatusCode ByRef))
Tagging subscribers to this area: @dotnet/ncl, @bartonjs, @vcsjones See info in area-owners.md if you want to be subscribed.
Author: | dumkin |
---|---|
Assignees: | - |
Labels: | `area-System.Net.Security`, `untriaged`, `needs-area-label` |
Milestone: | - |
/cc @filipnavara in case something jumps out at you.
I can reproduce this against net8.0, but not
main
.
main
switched the implementation of NegotiateAuthentication
from full GSSAPI to managed SPNEGO + managed NTLM + Kerberos through GSSAPI for all Apple platforms. That radically reduces the surface where we depend on the system's GSSAPI implementation. While this particular memory corruption is not something I recognize I have a number of unresolved Apple feedback items for different buffer overruns in the GSSAPI implementation. Obviously I cannot rule out an API misuse on our side.
On .NET 8 it's possible to opt-in to the managed NTLM/SPNEGO implementation by adding this property to the .csproj file: <RuntimeHostConfigurationOption Include="System.Net.Security.UseManagedNtlm" Value="true" />
.
did the repro work for you @filipnavara? I could not reproduce it so far on my (Intel) MacBook. I simply get 401 back.
And perhaps if you do have repro @vcsjones, could you try it with 6.0 or 7.0? I'm wondering if this is 8.0 regression or of the issue always existed.
@wfurt hello, have you tried running it several times? I wrote in the readme repo that the bug seems to be not stable, it almost always ends in an error, but sometimes it really just returns 401. And on net 7 it works stably, I checked it specifically so this is a regression. in my company the bug was reproduced by at least 3 people and there were both arm and intel processors
yes, I did run it several times @dumkin. I can let it sit in loop .... and I can possibly also get hands on arm Mac.
And perhaps if you do have repro @vcsjones, could you try it with 6.0 or 7.0? I'm wondering if this is 8.0 regression or of the issue always existed.
I have similar results as @dumkin.
System.Net.Security.UseManagedNtlm
: No crashes in 10 runsfalse
for System.Net.Security.UseManagedNtlm
: Crashes consistently under LLDBSince it may be sensitive to macOS environment:
❯ sw_vers
ProductName: macOS ProductVersion: 14.3 BuildVersion: 23D56
This is on an M1.
I'm wondering if this is 8.0 regression
It seems that way. main
"fixes" the issue by using the managed implementation, but if you turn it off, the issue comes back.
@vcsjones Thanks for the summary. I will investigate tomorrow morning.
Update: I can reproduce this.
This is starting to ring some bells. macOS implementation of gss_init_sec_context
has the habbit of destroying the context handle on error and replacing it with an invalid one. One must absolutely not touch the old handle afterwards. I originally fixed this in https://github.com/dotnet/runtime/pull/71484 but apparently it regressed during the rewrite somewhere here:
FWIW Apple updated the code last September and added more of the early frees here - https://github.com/apple-oss-distributions/Heimdal/commit/07a31139e65ae4337c3bf328b3cd1465bb2a84b0#diff-c60d25db88cb547db736ca8f6a23712a4ce2bf2be392eb265878d5b7b2f020f7 - which could explain why it's only happening on some versions of macOS.
We have a new Apple bug here. Yay!
In the call to gss_init_sec_context
we pass our cred_handle
to GSSAPI. Since we are doing SPNEGO protocol it's distributed to the underlying mechanisms, one of which happens to be the "DIGEST" one. The implementation of _gss_scram_init_sec_context
proceeds to save the cred_handle
without making a copy:
https://github.com/apple-oss-distributions/Heimdal/blob/48f86d0ceef220f75b16f0fc8266b53d50129c38/lib/gssapi/digest/init_sec_context.c#L324 https://github.com/apple-oss-distributions/Heimdal/blob/48f86d0ceef220f75b16f0fc8266b53d50129c38/lib/gssapi/digest/init_sec_context.c#L344
When the authentication inevitably fails, the gss_init_sec_context
API goes on to delete the whole security context:
* frame #0: 0x0000000192090a94 GSS`_gss_scram_release_cred
frame #1: 0x000000019208f1c0 GSS`_gss_scram_delete_sec_context + 124
frame #2: 0x000000019204e5a0 GSS`gss_delete_sec_context + 256
frame #3: 0x0000000192055448 GSS`_gss_spnego_internal_delete_sec_context + 308
frame #4: 0x000000019204f0c8 GSS`_gss_spnego_init_sec_context + 488
frame #5: 0x0000000192047ca8 GSS`gss_init_sec_context + 1236
and _gss_scram_delete_sec_context
deletes the cred_handle
that is in our ownership:
When we later call gss_release_cred
, like a good citizen, it inevitably crashes with double free. This bug was introduced with Heimdal-682 (committed to the OSS repository around 5 months ago).
Submitted to Apple as FB13600619
Finally, to answer why it doesn't crash on .NET 6/7... It actually does - if you run it in a loop and force handles to be finalized with GC.Collect()
:
for (;;)
{
try { await client.GetStringAsync(""); }
catch (System.Net.Http.HttpRequestException) { GC.Collect(); }
}
We were just leaking the handles and depending on finalization instead of releasing them deterministically.
To follow this ticket
I'm having the same problem when using the Microsoft.Exchange.WebServices.Data.EmailMessage.SendAndSaveCopy()
Method on Mac OS Sonoma 14.2.1 and .NET 8.
Unless I am mistaken, it seems we cannot do anything about this bug from .NET and it needs to be fixed in the GSSAPI codebase.
@filipnavara am I right?, Also, is there a link on which the status of the issue you filed can be tracked?
Unfortunately Apple Feedback is private. I will share any update as soon as I receive it but there has been no response from Apple so far.
I don’t think a workaround is possible on .NET side aside from the aforementioned UseManagedNtlm switch which bypasses the whole Apple SPNEGO implementation.
could we simply duplicate the credentials @filipnavara? Even if we play some weird games, could we prevent the crash?
could we simply duplicate the credentials @filipnavara?
You can leak the memory but you cannot reliably fix it. The SPNEGO doesn’t always end up in the Digest code path. That depends on how far you get with the authentication and possibly the negotiated algorithms. There’s no easy way to detect it and even if we somehow manage to detect it then it would start leaking once Apple fixes it.
Is there a way to tell their stack to skip including DIGEST in the negotiate mechs? That would certainly simplify the problem.
Is there a way to tell their stack to skip including DIGEST in the negotiate mechs?
I checked the code (https://github.com/apple-oss-distributions/Heimdal/blob/48f86d0ceef220f75b16f0fc8266b53d50129c38/lib/gssapi/spnego/compat.c#L247-L351; callers and callees) and I didn't find any public API to do so.
Since there is no easy workaround and the crash is pretty nasty, I'm wondering if we should port the System.Net.Security.UseManagedNtlm
to 8.0 .... and perhaps even flip the switch.
I was hesitant to take in late in 8.0 but we did not have crashes and this may be out long (LTS) without mitigation.
any thoughts on this @karelz ???
This currently causes issues on macOS and iOS (https://github.com/dotnet/runtime/issues/99892).
Please make UseManagedNtlm = True the default in 8.0 LTS.
It took us ages to trace this back to Ntlm credentials, most people might not even be able to trace this back since it causes random crashes.
Folks, if you are impacted by this issue, can you please add upvote on top post? It will be easier to track number of people impacted. Thanks!
We have seen the crashes on .NET 7 and .NET 8 in our app. The problem is that it's not easily traceable to the root cause. I expect the number of the people to be affected to be high. Just my $.02.
How big would be the change to introduce the switch ion 8.0 @filipnavara? I assume not that big as we already do it for Android???? We discussed the possibility of servicing with @karelz. Changing the default may be difficult to push through. (but we can try)
The runtime part of the switch is already there, we just miss the SDK MSBuild part (https://github.com/dotnet/sdk/pull/34903). Possibly we would need to backport some of the Managed NTLM fixes in dotnet/runtime which were usually small and targeted.
@tobyperplex
Add upvote on top post.. Do you mean the post with the most upvotes (by wfurt) or the one on the top of the page (by dumkin)?
I meant original post at the very top by dumkin
FWIW I tried to burn one of my paid support requests for code-level support. Today I received a reply that there's no workaround, the feedback is still under investigation, and I got the support request credited back to my account.
Folks, if you are impacted by this issue, can you please add upvote on top post? It will be easier to track number of people impacted. Thanks!
We've reached 20 upvotes! 🎉
I didn't receive any update from Apple but the bug seems to be fixed in macOS 15.0 Beta 1 (24A5264n). The small repro code I sent them no longer crashes.
It also fixed on macOS 14.5 in latest update.
It also fixed on macOS 14.5 in latest update.
Wow you're right! macOS 14.5 came out on May 13'th 2024 so it's already been working for a month haha. I guess everybody thought we just had to wait on the fix to be implemented in the .NET 8 sdk.
thanks @filipnavara for taking it up to Apple. Should we close this as resolved? Perhaps @dumkin can also verify as the original reporter.
The related problem, where I get the same error during dotnet restore
when having credentials for a NuGet package source in my NuGet Config is unfortunately still happening with MacOs 14.5 and .NET SDK 8.0.302
what is your stacktrace @Scyarah? DO you have any simplified repro?
Unfortunately not that easily, because it is reliant on having a NuGet packagesource other than the standard one and using PackageSourceCredentials for this.
I currently need this for my workflow, because we are using the artifact storage of a local azure devops instance for our NuGet package. But with a workaround to the set the version with a global.json to 7.x I can currently still use it.
Sorry, for not being able to provide more.
Then I think you should try to reproduce it under debugger and/or get dump. I think it will be difficult to investigate otherwise @Scyarah
Unfortunately not that easily, because it is reliant on having a NuGet packagesource other than the standard one and using PackageSourceCredentials for this.
I currently need this for my workflow, because we are using the artifact storage of a local azure devops instance for our NuGet package. But with a workaround to the set the version with a global.json to 7.x I can currently still use it.
Sorry, for not being able to provide more.
FWIW I have the same issue. Creating a new project via dotnet new webapi
crashes no matter the framework version assigned via -f FRAMEWORK
when our organization's internal Nuget source (which uses authentication) is enabled. Once it's disabled, everything works.
If there's any way I can provide information about it, please point me in the right direction.
@lbelloq are you still encountering this issue? you should be able to run dotnet
executable under a debugger (lldb?) and then dump the callstack when it crashes.
something like this
lldb -o run -- /path/to/dotnet new webapi
and then using bt
command to get a backtrace when it crashes.
Because I just encountered the error again I ran dotnet restore
within lldb in my repository.
(lldb) target create /usr/local/share/dotnet/dotnet
Current executable set to '/usr/local/share/dotnet/dotnet' (arm64).
(lldb) run restore
Process 32478 launched: '/usr/local/share/dotnet/dotnet' (arm64)
Determining projects to restore...
Process 32478 stopped
* thread #34, name = '.NET TP Worker', stop reason = signal SIGUSR1
frame #0: 0x0000000100d19874 libcoreclr.dylib`YieldProcessorNormalization::ScheduleMeasurementIfNecessary() + 68
libcoreclr.dylib`YieldProcessorNormalization::ScheduleMeasurementIfNecessary:
-> 0x100d19874 <+68>: b 0x100d198b0 ; <+128>
0x100d19878 <+72>: adrp x8, 710
0x100d1987c <+76>: add x8, x8, #0xf90 ; YieldProcessorNormalization::s_isMeasurementScheduled
0x100d19880 <+80>: ldrb w8, [x8]
Target 0: (dotnet) stopped.
(lldb) continue
Process 32478 resuming
Process 32478 stopped
* thread #34, name = '.NET TP Worker', stop reason = signal SIGUSR1
frame #0: 0x0000000103e12dd8
-> 0x103e12dd8: udiv w0, w23, w20
0x103e12ddc: msub w23, w0, w20, w23
0x103e12de0: cmp w20, #0x0
0x103e12de4: b.le 0x103e12eec
Target 0: (dotnet) stopped.
(lldb) continue
Process 32478 resuming
Process 32478 stopped
* thread #34, name = '.NET TP Worker', stop reason = signal SIGUSR1
frame #0: 0x0000000103e12dd8
-> 0x103e12dd8: udiv w0, w23, w20
0x103e12ddc: msub w23, w0, w20, w23
0x103e12de0: cmp w20, #0x0
0x103e12de4: b.le 0x103e12eec
Target 0: (dotnet) stopped.
(lldb) continue
Process 32478 resuming
dotnet(32478,0x170ccb000) malloc: *** error for object 0x6000031ce220: pointer being freed was not allocated
dotnet(32478,0x170ccb000) malloc: *** set a breakpoint in malloc_error_break to debug
Process 32478 stopped
* thread #29, name = '.NET TP Worker', stop reason = signal SIGABRT
frame #0: 0x0000000185f615f0 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`:
-> 0x185f615f0 <+8>: b.lo 0x185f61610 ; <+40>
0x185f615f4 <+12>: pacibsp
0x185f615f8 <+16>: stp x29, x30, [sp, #-0x10]!
0x185f615fc <+20>: mov x29, sp
Target 0: (dotnet) stopped.
The backtrace
* thread #29, name = '.NET TP Worker', stop reason = signal SIGABRT
* frame #0: 0x0000000185f615f0 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x0000000185f99c20 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x0000000185ea6a30 libsystem_c.dylib`abort + 180
frame #3: 0x0000000185db6dc4 libsystem_malloc.dylib`malloc_vreport + 896
frame #4: 0x0000000185dba430 libsystem_malloc.dylib`malloc_report + 64
frame #5: 0x0000000185dd4494 libsystem_malloc.dylib`find_zone_and_free + 528
frame #6: 0x0000000196cc3b5c GSS`_gss_scram_release_cred + 176
frame #7: 0x0000000196c80468 GSS`_gss_mg_release_cred + 124
frame #8: 0x000000018614be90 CoreFoundation`_CFRelease + 292
frame #9: 0x0000000196c803c4 GSS`gss_release_cred + 76
frame #10: 0x000000010856cd28
frame #11: 0x000000010b9f828c
frame #12: 0x000000010c3b625c
frame #13: 0x00000001085703f8
frame #14: 0x000000010857baf8
frame #15: 0x00000001082704a0
frame #16: 0x000000010ba16acc
frame #17: 0x000000010c0ede24
frame #18: 0x000000010ba168d4
frame #19: 0x000000010ba16730
frame #20: 0x000000010c101ef4
frame #21: 0x000000010c0ff1f8
frame #22: 0x000000010c354050
frame #23: 0x0000000108282964
frame #24: 0x000000010ba166e4
frame #25: 0x000000010c0ede24
frame #26: 0x000000010ba164ec
frame #27: 0x000000010ba16348
frame #28: 0x000000010c101ef4
frame #29: 0x000000010c0ff1f8
frame #30: 0x000000010c39a00c
frame #31: 0x0000000103e8fbc4
frame #32: 0x000000010827ffb4
frame #33: 0x000000010ba1610c
frame #34: 0x000000010c0ede24
frame #35: 0x000000010ba15f70
frame #36: 0x000000010ba15e08
frame #37: 0x0000000108599788
frame #38: 0x000000010ba15dbc
frame #39: 0x000000010c0ede24
frame #40: 0x000000010ba15d5c
frame #41: 0x0000000108597d64
frame #42: 0x000000010ba0e4c4
frame #43: 0x000000010c0ede24
frame #44: 0x000000010ba0e414
frame #45: 0x000000010c297b28
frame #46: 0x000000010c0cdd08
frame #47: 0x000000010c0bd110
frame #48: 0x0000000103e200d4
frame #49: 0x0000000103df9350
frame #50: 0x0000000100dd3444 libcoreclr.dylib`CallDescrWorkerInternal + 132
frame #51: 0x0000000100c4e450 libcoreclr.dylib`DispatchCallSimple(unsigned long*, unsigned int, unsigned long long, unsigned int) + 268
frame #52: 0x0000000100c61794 libcoreclr.dylib`ThreadNative::KickOffThread_Worker(void*) + 148
frame #53: 0x0000000100c20988 libcoreclr.dylib`ManagedThreadBase_DispatchOuter(ManagedThreadCallState*) + 256
frame #54: 0x0000000100c20ee0 libcoreclr.dylib`ManagedThreadBase::KickOff(void (*)(void*), void*) + 32
frame #55: 0x0000000100c6186c libcoreclr.dylib`ThreadNative::KickOffThread(void*) + 172
frame #56: 0x0000000100b39a40 libcoreclr.dylib`CorUnix::CPalThread::ThreadEntry(void*) + 380
frame #57: 0x0000000185f99f94 libsystem_pthread.dylib`_pthread_start + 136
Not sure if this information from lldb is actually helpful. I'm not experienced with debugging binaries.
My system information
.NET SDK:
Version: 8.0.302
Commit: ef14e02af8
Workload version: 8.0.300-manifests.f6879a9a
MSBuild version: 17.10.4+10fbfbf2e
Laufzeitumgebung:
OS Name: Mac OS X
OS Version: 14.6
OS Platform: Darwin
RID: osx-arm64
Base Path: /usr/local/share/dotnet/sdk/8.0.302/
@Scyarah The backtrace seems to be from the same problem as before, the double free via gss_release_cred
. This is strange because as per @filipnavara's comment, the issue should have been fixed from the Apple's side already in the OS version you are running.
I’ll try to do more tests across various macOS versions next week.
I have the same problem with macOS 14.6 and macOS 14.6.1, working with .net maui and nugets. Very time consuming, as nearly every time there is a problem with maui itself or in combination with apple, you get a segmentation fault and need to try every little part of your project to see, what the problem really is.
I have updated to the .NET 9 SDK (9.0.100-rc.2.24474.11 ) and now it seems like it is working again.
.NET 9 uses the managed NTLM and SPNEGO implementation on macOS by default so it would not hit the issue.
Description
Our company uses Microsoft Exchange email, to communicate with it we have a separate service that is authenticated using HttpClientHandler.Credentials. After updating .NET to version 8, I received application crashes on any request. After finding out the reasons, I was able to get to the minimal environment that repeats this error. The application almost always crashes. And it seems this only happens on MacOS runtime.
Error:
Reproduction Steps
I have prepared a repository that allows you to reproduce the error
https://github.com/dumkin/net8-credentials-crash
Also you may run code
Expected behavior
HttpClient auth and worked correctly
Actual behavior
Application crashed
Regression?
Yes, it worked on .NET 7
Known Workarounds
No
Configuration
Other information
No response