getsentry / sentry-dotnet

Sentry SDK for .NET
https://docs.sentry.io/platforms/dotnet
MIT License
582 stars 206 forks source link

Debug meta info not sent when using `PublishSingleFile=true` #2362

Closed mattjohnsonpint closed 1 year ago

mattjohnsonpint commented 1 year ago

Package

Sentry

.NET Flavor

.NET

.NET Version

7.0.2013

OS

Any (not platform specific)

SDK Version

3.31.0

Steps to Reproduce

Create a simple console app:

using Sentry;

SentrySdk.Init(options =>
{
    options.Dsn = "...";
    options.Debug = true;
});

try
{
    throw new Exception("Test");
}
catch (Exception exception)
{
    SentrySdk.CaptureException(exception);
}

In the csproj, configure to upload symbols and sources to Sentry (and authenticate with sentry-cli login).

<PropertyGroup>
    <SentryOrg>...</SentryOrg>
    <SentryProject>...</SentryProject>
    <SentryUploadSymbols>true</SentryUploadSymbols>
    <SentryUploadSources>true</SentryUploadSources>
</PropertyGroup>

Compile and publish with:

dotnet publish -c Release -p:PublishSingleFile=true

Run the app from its published folder:

./bin/Release/net7.0/osx-arm64/publish/MyConsoleApp

Expected Result

The event generated and shown in the console debug output should contain a debug_meta section, including the debug_id.

In Sentry, the source context should be visible, and the debug images section should show the symbols were found.

Actual Result

The event is missing the debug_meta section when -p:PublishSingleFile=true is used. Thus, source context is not shown.

Line numbers will still be shown if the .pdb file is present (which it is by default in the publish folder), but if you delete it - or ship the executable app without the pdb file, then client-side symbolication won't occur. Server-side symbolication also won't occur because the debug_meta section is missing from the event.

mattjohnsonpint commented 1 year ago

The code that resolves the assembly debug information is here:

https://github.com/getsentry/sentry-dotnet/blob/00299c7a950f7229b31153db5cff82001527b0d7/src/Sentry/Internal/DebugStackTrace.cs#L413-L422

https://github.com/getsentry/sentry-dotnet/blob/00299c7a950f7229b31153db5cff82001527b0d7/src/Sentry/Internal/DebugStackTrace.cs#L380-L388

I presume that we're not able to read the assembly directly from a file when that file is a single-file-executable. We need to figure out how to get a PEReader in that environment.

If necessary, we can make another AssemblyReader implementation, similar to AndroidAssemblyReader - but it also might be possible just to do this directly in TryReadAssembly.

mattjohnsonpint commented 1 year ago

Note from: https://learn.microsoft.com/dotnet/core/deploying/single-file/overview?tabs=cli#api-incompatibility

Module.FullyQualifiedName - Returns a string with the value of <Unknown> or throws an exception.

Thus, we'll need to figure out first how to get something more usable directly from the Module passed in to GetDebugImage.

jamescrosswell commented 1 year ago

I'm getting slightly different behaviour than what's described in the issue details. When I run:

dotnet publish -c Release -p:PublishSingleFile=true

...the build does not succeed and there is nothing in ./bin/Release/ that could be run.

I get aproximately 10 separate errors from the compiler, all of which are instances of the following 3 errors:

error IL3002: Using member 'System.Reflection.Module.Name' which has 'RequiresAssemblyFilesAttribute' can break functionality when embedded in a single-file app. Returns <Unknown> for modules with no file path.
error IL3002: Using member 'System.Reflection.Module.FullyQualifiedName' which has 'RequiresAssemblyFilesAttribute' can break functionality when embedded in a single-file app. Returns <Unknown> for modules with no file path.
error IL3000: 'System.Reflection.Assembly.Location' always returns an empty string for assemblies embedded in a single-file app. If the path to the app directory is needed, consider calling 'System.AppContext.BaseDirectory'.

Some of those errors come from Ben.Demystifier/TypeNameHelper.cs, Ben.Demystifier/ResolvedParameter.cs and Ben.Demystifier/Internal/PortablePdbReader.cs.

Others come from Sentry/PlatformAbstractions/RuntimeInfo.cs.

Finally some come from Sentry/Internal/DebugStackTrace.cs.

So it seems we'd need to leverage alternatives to those methods in multiple places in our codebase to fix this.

One possible solution described here:

... use AssemblyExtensions.TryGetRawMetadata. This method returns just the metadata blob, not the whole assembly. It works well for System.Reflection.Metadata reader and https://github.com/dotnet/runtime/issues/36590#issuecomment-688030287. I am not sure whether it works for Cecil.

mattjohnsonpint commented 1 year ago

Interesting. The errors make sense, but I wonder why I didn't get them and you did. But anyway, they make sense.

Great find on AssemblyExtensions.TryGetRawMetadata. That sounds super promising. In theory, that approach could relieve us from needing to read the assembly from the file in all cases. Please explore that a bit. Thanks.

mattjohnsonpint commented 1 year ago

... though looking closer, I can't tell whether the values we need are actually in that metadata. One can get a MetadataReader from a PEReader, but not the opposite direction - so unless the PEHeader, CoffHeader, and DebugDirectoryEntries are available in the metadata somewhere I'm not seeing, then I don't think that will work.

jamescrosswell commented 1 year ago

Investigation so far

The Sentry SDK uses reflection to capture information about the stack trace:

  1. When creating a debug image to store in the Exception.StackTrace
  2. When enhancing the Stack Frame

Some of that reflection code currently relies on a file path to the assembly for the module being reflected on. This is problematic when the application is published as a single-file executable, because the assembly is embedded in the executable and does not have any file path.

Debug Image

Ultimately our code is trying to assemble the following:

var debugImage = new DebugImage
{
    Type            // "pe_dotnet",
    CodeId          // $"{headers.CoffHeader.TimeDateStamp:X8}{peHeader.SizeOfImage:x}"
    CodeFile        // module.FullyQualifiedName
    DebugId         // $"{codeView.Guid}-{entry.Stamp:x8}" or $"{codeView.Guid}-{codeView.Age}"
    DebugChecksum   // $"{checksum.AlgorithmName}:{checksumHex}"
    DebugFile       // peReader.ReadCodeViewDebugDirectoryData(entry).Path
    ModuleVersionId // module.ModuleVersionId,
};

All of this is presumably just information to help Sentry track down the appropriate debug symbols, when the debug image is uploaded to Sentry (along with the rest of the exception information).

Much of this information comes from the PEReader.PEHeaders... and it doesn't appear to be available anywhere else.

The most obvious solution would be to use the PEReader to read in the information about a module/assembly that was loaded in memory rather than one that was located on disk... and the PEReader appears to have a constructor that could be used for this purpose:

PEReader(Byte*, Int32)  // Creates a Portable Executable reader over a PE image stored in memory.

However I haven't found anyway to work out where embedded modules are located in memory or their size.

Enhancing Stack Frames

This is done by Ben.Demystifier, which only needs a MetaDataReader. We could potentially use something like this to get a MetaDataReader for a module, without needing to know where the assembly for the module was located on disk:

        if (!module.Assembly.TryGetRawMetadata(out byte* blob, out int length))
        {
            return string.Empty;
        }

        var moduleMetadata = ModuleMetadata.CreateFromMetadata((IntPtr)blob, length);
        moduleMetadata.GetMetadataReader();

To use TryGetRawMetadata we'd need to:

We'd also need to be able to make a pull request to the Ben.Demystifier project (or fork this) to modify the code in that package which assumes assembly modules will be in separate files.

Specifics

The following are the specific points in our code that are currently problematic when running as a single-file executable.

You can see these for yourself by creating the sample project referenced in the description of the problem and referencing the Sentry.csproj file directly (rather than the nuget package) and then running the following:

dotnet publish -c Release -p:PublishSingleFile=true

But I've summarized them here for convenience.

System.Reflection.Module.Name

IL3002: Using member 'System.Reflection.Module.Name' which has 'RequiresAssemblyFilesAttribute' can break functionality when embedded in a single-file app. Returns for modules with no file path.

System.Reflection.Assembly.Location

IL3000: 'System.Reflection.Assembly.Location' always returns an empty string for assemblies embedded in a single-file app. If the path to the app directory is needed, consider calling 'System.AppContext.BaseDirectory'.

Some options for the RuntimeInfo

System.Reflection.Module.FullyQualifiedName

IL3002: Using member 'System.Reflection.Module.FullyQualifiedName' which has 'RequiresAssemblyFilesAttribute' can break functionality when embedded in a single-file app. Returns for modules with no file path.

mattjohnsonpint commented 1 year ago

Great job on the research. It seems some of these are more possible than others, but there's no quick-fix.

As for Ben.Demystifier - we are already taking our submodule from a fork at https://github.com/getsentry/Ben.Demystifier - so you can make modifications there. We can then work to merge those changes upstream separately.

jamescrosswell commented 1 year ago

Possibly some more progress. ILSpy can open self-contained executables... and ILSpy is open source.

The entry point to the relevant parts of their code, I believe, is the AssemblyTreeNode.LoadChildren() method... when you expand a tree node in the ILSpy UI that represents an assembly (which I think is what our single file executables would be represented as), this is the code that runs to enumerate all the various modules that are bundled into the single file executable.

image

The remaining challenge then is reverse engineering the ILSpy code to work out how they're doing that.

jamescrosswell commented 1 year ago

OK, ILSpy is cunning. Here's what it's doing:

Determine whether it's a bundle

If it can't load the assembly from a file, it then checks to see if it can load it from a bundle var bundle = LoadedPackage.FromBundle(fileName);

Basically it loads the whole package into into a MemoryMappedFile and then it hunts for a bundle signature in that memory stream. If it finds one, it can then return the bundleHeaderOffset which is what is used to lookup all the other bundle entries.

Get the other stuff in the bundle

Most of the work there happens in the SingleFileBundle.ReadManifest(Stream stream) method. It's this that enumerates all of the entries (which include resource files being bundled with the single file executable, but also any bundled assemblies) with useful stuff like the offset, within the file/stream where that entry is kept.

Load details for specific entries/assemblies

In ILSpy at least, this happens when you try to expand the node representing an embedded assembly in the ILSpy UI. That's where the PackageFolderTreeNode.LoadChildrenForFolder method gets invoked, and there's some specific logic in there to handle dlls (I think folders are only relevant for resources - not for embedded assemblies).

This is where the offsets for bundled assemblies that were collected from the package manifest are used to load the bundled assemblies from memory - which happens in LoadedPackage.ResolveFileName(string name).

There's a bit of inception going on at this point. The LoadedAssembly constructor is called. One of the parameters that gets passed in is Task.Run(entry.TryOpenStream)... In the case of bundled assemblies, the concrete implementaion of TryOpenStream that gets called is eventually BundleEntry.TryOpenStream. This method is critical as it's where the logic to decompress bundles is implemented, if necessary. Otherwise, if the assembly hasn't been bundled compressed, a plain vanilla UnmanagedMemoryStream gets returned starting at the appropriate entry offset.

Finally, once that Task completes and hands back a stream for the assembly we want, this gets used in the LoadedAssembly constructor in a call to LoadAsync... which is the same method that loaded our single file executable... only this time, the branch of code that gets executed is not that dealing with bundles but the one that loads vanilla assemblies from a memory stream.

Thankfully, ILSpy also has an MIT License... so it'd be OK to copy/reuse whichever bits of this logic were appropriate.

mattjohnsonpint commented 1 year ago

Sounds like we're on the right path. Cool!

Thankfully, ILSpy also has an MIT License... so it'd be OK to copy/reuse whichever bits of this logic were appropriate.

If we are just learning from ILSpy and using the same approach, that's fine. If you actually need to copy code from the ILSpy project, please put it in its own subdirectory and add an attribution file. For example, see /src/Sentry/Internal/FastSerialization - which is another bit of code we've internalized. Thanks.