getsentry / sentry-dotnet

Sentry SDK for .NET
https://docs.sentry.io/platforms/dotnet
MIT License
579 stars 206 forks source link

Installation ID is constantly being reevaluated #3524

Closed klemmchr closed 1 month ago

klemmchr commented 1 month ago

Package

Sentry

.NET Flavor

.NET

.NET Version

8.0.0

OS

Linux

SDK Version

4.9.0

Self-Hosted Sentry Version

No response


I'm observing repeated log errors in my ASP.NET Core app.

fail: Sentry.ISentryClient[0]
      => SpanId:1423812b5249bd8d, TraceId:11f7f8ef457099059955f79b83b98f6f, ParentId:0000000000000000 => ConnectionId:0HN5LM26FI76P => RequestPath:/healthz RequestId:0HN5LM26FI76P:00000001
      Failed to resolve persistent installation ID.
      System.UnauthorizedAccessException: Access to the path '/app/Sentry' is denied.
       ---> System.IO.IOException: Permission denied
         --- End of inner exception stack trace ---
         at System.IO.FileSystem.CreateParentsAndDirectory(String fullPath, UnixFileMode unixCreateMode)
         at System.IO.FileSystem.CreateDirectory(String fullPath, UnixFileMode unixCreateMode)
         at System.IO.Directory.CreateDirectory(String path)
         at Sentry.InstallationIdHelper.TryGetPersistentInstallationId()

Looking at the corresponding code I cannot explain this behavior.

https://github.com/getsentry/sentry-dotnet/blob/4617aeea07649fa6ef781b24a787c1c5bc09d84b/src/Sentry/InstallationIdHelper.cs#L6-L91

It seems that the machine ID is being recomputed every time a transaction is being send.

On a side note: failing to get the persistent installation ID should not output a fail log. When using non-root .NET Docker containers you typically have no write access to the app root directory since you're using a different user.

jamescrosswell commented 1 month ago

Looking at the corresponding code I cannot explain this behavior.

Hi @klemmchr ,

The only reason I can think why you'd see that repeatedly is:

  1. Either if the log errors related to separate instances of your app running (either on a single device or multiple devices)
  2. Or if Sentry was also unable to resolve a HardwareInstallationId and this code failed: https://github.com/getsentry/sentry-dotnet/blob/009241d1fb662ae6ad88d5bffc854ca613c2ec72/src/Sentry/InstallationIdHelper.cs#L122-L124

I'm not sure where the comment // Never fails came from in the above source code... The docs say that reading the Environment.MachineName property can result in an InvalidOperationException if the name of this computer cannot be obtained.

Are you able to run a little tests to see if you can read the Environment.MachineName property in your application (to see if that might be the issue)? I can always wrap it in a try/catch, just in case, but doing that and rolling out a new release to work out whether this is causing the issue is probably a lot slower.

On a side note: failing to get the persistent installation ID should not output a fail log. When using non-root .NET Docker containers you typically have no write access to the app root directory since you're using a different user.

I think that's fair. What are your thoughts on these two similar errors when trying to resolve the hardware id as a fallback? https://github.com/getsentry/sentry-dotnet/blob/009241d1fb662ae6ad88d5bffc854ca613c2ec72/src/Sentry/InstallationIdHelper.cs#L95-L118

I think in that scenario the error is justified as I can't think of a scenario where someone would be running Sentry without a network card. But in your case, if you're getting an error reading the Environment.MachineName then that implies you're also not able to resolve the hardware installation id using the properties of the network card... so I'd be curious to understand why that's the case.

klemmchr commented 1 month ago

Either if the log errors related to separate instances of your app running (either on a single device or multiple devices)

I can definitely rule this out. The logs have been pulled from a single replica and I could watch the logs being generated in real time when simply using the application.

Or if Sentry was also unable to resolve a HardwareInstallationId and this code failed:

Even then this behavior is odd. TryGetInstallationId() is only called in SentryOptions once and wrapped with Lazy<>.
If we assume that TryGetInstallationId threw an exception this would result in either Enricher.Apply or GlobalSessionManager.StartSession failing altogether. Also, I did not see any errors about being unable to get the installation id from the NIC. So my only idea would be that Environment.MachineName returns null (for whatever reason).

Nontheless, there should be a proper fallback if shit hits the fan. Either InstallationIdHelper should have a boolean flag that computation did already happen to prevent recomputation or the Lazy<> bit should be moved from SentryOptions to InstallationIdHelper. This would also have the benefit that locking wouldn't be needed any more since Lazy is thread safe and _installationId could be removed. Alternatively, generating a random GUID (or whatever other format) as the ultimate fallback would be an option.

Are you able to run a little tests to see if you can read the Environment.MachineName property in your application (to see if that might be the issue)? I can always wrap it in a try/catch, just in case, but doing that and rolling out a new release to work out whether this is causing the issue is probably a lot slower.

I will try to make some repro for that. I'm not sure whether I can match the full setup, but I will try at least.

I think in that scenario the error is justified as I can't think of a scenario where someone would be running Sentry without a network card.

I'm not sure whether an unprivileged user has access to network cards. I could imagine that in my scenario the app user has no access either and therefore NetworkInterface.GetAllNetworkInterfaces() returns an empty list, so at least no exception there. Emitting an error here once could be fine, but in general error logs should not be used when an operation is not failing. See the Microsoft docs for the full picture. IMHO this should be a warning at max since Sentry does not rely on a machine id and it can be null without affecting the functionality.

klemmchr commented 1 month ago

@jamescrosswell sorry, I accidentally closed this issue and cannot reopen it again. Could you do me the favor?

jamescrosswell commented 1 month ago

Thanks @klemmchr,

IMHO this should be a warning at max since Sentry does not rely on a machine id and it can be null without affecting the functionality.

That sounds logical. At the very least then, we could change those LogError statements to LogWarning statements.

Even then this behavior is odd. TryGetInstallationId() is only called in SentryOptions once and wrapped with Lazy<>.

Hm, that is odd. It implies multiple instances of SentryOptions are being created.

I will try to make some repro for that. I'm not sure whether I can match the full setup, but I will try at least.

That would definitely help as, if there genuinely are multiple SentryOptions instances, I don't think using Lazy initialisation in the InstallationIdHelper would make a difference. Be good to know exactly what's going on before putting together a fix then.

klemmchr commented 1 month ago

Hm, that is odd. It implies multiple instances of SentryOptions are being created.

Are SentryOptions being recreated when IConfiguration changes? I could imagine some misconfiguration with Azure App Configuration on our end that constantly reloads the configuration. It would be odd, but there is a chance.

I just revisited our logs. The spam ended around 24 hours ago. But the 24 hours before that I find more than 15000 log entries while hitting the query limit. There is probably way more than that. Given that it is probably not possible to reproduce the issue. However, I can still run the stuff in TryGetInstallationId in our environment and get the output from it.

To give this issue some context, I found this by accident when trying to enable profiling and monitoring the logs afterwards. I had weird issues that all of our requests were failing when enabling profiling so I turned Sentry debugging on but was not able to get proper log output. Then I switched both of them off and created this issue. I cannot remember which combination of settings was active when this happened.