Open valentiniliescu opened 2 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to this area: @vitek-karas, @agocke, @vsadov See info in area-owners.md if you want to be subscribed.
Author: | valentiniliescu |
---|---|
Assignees: | - |
Labels: | `area-Host`, `untriaged` |
Milestone: | - |
Interesting, that could possibly explain why the automated .NET installation on my self-hosted macOS CI agents fails every month. When the new SDK is released it overwrites the existing one but due to some funky aspect of the code signature verification (modification time matters for ad-hoc signing but possibly affects the cache for other signatures too) it results in "137" kills from the system for invalid signature. I always have to go to the machines, delete the whole dotnet installation and let it redownload again.
In our case, we have some certificates in the keychain that we give dotnet permission to access them. A .NET version upgrade caused dotnet to not be able to access the certificates, because the identifier changed.
Btw, we have the same scenario and also experience this.
Unrelated to that there seems to be a bug where deleting the certificates doesn't delete the private keys from the key chain but I didn't get to narrow that one yet.
In our case, we have some certificates in the keychain that we give dotnet permission to access them. A .NET version upgrade caused dotnet to not be able to access the certificates, because the identifier changed.
Btw, we have the same scenario and also experience this.
Unrelated to that there seems to be a bug where deleting the certificates doesn't delete the private keys from the key chain but I didn't get to narrow that one yet.
Are you using security delete-certificate
to delete the certificates? Try to use security delete-identity
which deletes the certificate AND the private keys, and has the same arguments.
Are you using
security delete-certificate
to delete the certificates? Try to usesecurity delete-identity
which deletes the certificate AND the private keys, and has the same arguments.
Thanks for the tip, I do the manual cleanup with the security
tool (since the Keychain Access UI struggles to handle the number of items that are generated by the constant CI runs).
However, we primarily rely on the X509Store
managed API to inject and delete the certificates within the unit tests. It's on my backlog to find out what causes the private keys not to be deleted. The same code works on Windows. There were already some fixes in .NET 7 to insert couple of missing CFRelease
calls on the identity objects but I don't know if it fixed the issue or not.
I don't necessarily want to spam this issue with unrelated content. Just wanted to point out that we are also running into the scenario with keychain permissions on .NET updates.
@adiaaida @mmitche - this is something that needs to happen both in the runtime and in the staging pipeline. Essentially, we need to specify the -i parameter to codesign in osx_signing_operations.py
per image/dylib.
Tagging subscribers to this area: @dotnet/runtime-infrastructure See info in area-owners.md if you want to be subscribed.
Author: | valentiniliescu |
---|---|
Assignees: | - |
Labels: | `area-Infrastructure`, `untriaged` |
Milestone: | - |
cc: @richlander since you've done a lot of the osx-arm64 UX definition work around installation. I think this might be something we should fix on the LTS?
@MattGal you have the most context on the macos signing scripts, can you help out with @hoyosjs's comment
@MattGal you have the most context on the macos signing scripts, can you help out with @hoyosjs's comment
I haven't touched the code signing stuff since almost a year before 6.0.7 shipped, but if I can help I will. Not sure what the ask here is yet.
Interesting issue. We first need to figure out the right behavior. W/o that (since I'm now OOF), this looks like a candidate for servicing.
Who do I need to bribe to fix this? I have to manually fix all of our self-hosted CI machines every month on Patch Tuesday. The actions/setup-dotnet
action on GitHub Actions overrides the existing installation which trips the macOS code signing. Any attempt to run dotnet
results in instant kill with exit code 137. So I have to go to all the CI machines and do rm -fr ~/.dotnet
to fix this. Then I have to give keychain permissions to the new dotnet
executable on all the machines. Repeat next month, every month.
Just spent a few hours resolving an incident caused by a break in our logic around making sure we re-trust dotnet every time we download it, but that break wasn't revealed until today's 8.0.7 release came out and the identifier changed.
The break in our logic is obviously my team's fault, but if this issue didn't exist we wouldn't have to maintain any complicated trust logic. Would appreciate this get fixed so that I don't ever have to fight this again.
Description
On macOS, dotnet executable is signed with different identifiers for different versions. The identifier should be stable, as recommended in the documentation: Code Signing Guide "When you have qualified a new version of your product, sign it just as you signed the previous version, with the same identifier and the same designated requirement. The user’s system considers the new version of your product to be the same program as the previous version. For example, Keychain Services does not distinguish older and newer versions of your program as long as both are signed and the unique Identifier remains constant."
In our case, we have some certificates in the keychain that we give dotnet permission to access them. A .NET version upgrade caused dotnet to not be able to access the certificates, because the identifier changed.
Reproduction Steps
codesign -dvv /usr/local/share/dotnet/dotnet
codesign -dvv /usr/local/share/dotnet/dotnet
Expected behavior
The Identifier for both versions is the same.
Actual behavior
The Identifier is different:
dotnet-555549440327671c9566330788e80d8e2b4b60b5
dotnet-555549440d4af8f3fef0388bb73cd4b37a5a9195
Regression?
No response
Known Workarounds
No response
Configuration
.NET SDK: Version: 6.0.302 Commit: c857713418
Runtime Environment: OS Name: Mac OS X OS Version: 12.5 OS Platform: Darwin RID: osx.12-arm64 Base Path: /usr/local/share/dotnet/sdk/6.0.302/
Host: Version: 6.0.7 Architecture: arm64 Commit: 0ec02c8c96
Other information
No response