getsentry / symbolicator

Native Symbolication as a Service
https://getsentry.github.io/symbolicator/
MIT License
352 stars 45 forks source link

Unable to fetch executable from unified depot from code_id #1321

Closed rtjonnyr closed 8 months ago

rtjonnyr commented 8 months ago

Environment

I am running a local instance of Symbolicator from latest as of 2023/10/11 (https://github.com/getsentry/symbolicator/commit/8ce62ca7874504248f1a228f1d03b51f665a22f1)

Steps to Reproduce

  1. I have configured my local symbolicator and connected it to an s3 bucket with a unified layout that is in use with the Sentry web app.
  2. I have run the local symbolicator
  3. I have configured Visual Studio to use the local instance (http://localhost:3021/proxy)
  4. I have loaded a minidump downloaded from Sentry that has a matching executable and debuginfo in the S3 bucket into Visual Studio and hit Debug With Native Only

Expected Result

I would expect symbolicator to be able to find the matching executable in S3.

Actual Result

VisualStudio makes a request using the code_id and symbolicator is unable to fetch the executable file. However, if I drop the executable alongside the dmp file, Visual Studio then makes a request for the PDB using the debug_id and the local symbolicator instance correctly downloads the debuginfo file from S3 and serves up the PDB to Visual Studio

ashwoods commented 8 months ago

Hi! @rtjonnyr thanks for reaching out. We will discuss this with the team next week.

Swatinem commented 8 months ago

Have you tried turning up the log level? Symbolicator should internally log every url that it tries to fetch.

It might as well be possible that fetching files by CodeId does not work correctly. The logs should give some hints as to why.

rtjonnyr commented 8 months ago

Have you tried turning up the log level? Symbolicator should internally log every url that it tries to fetch.

It might as well be possible that fetching files by CodeId does not work correctly. The logs should give some hints as to why.

I have the logging level at trace. The output I get when Visual Studio tries to look up an executable that I have confirmed is stored in the S3 bucket is as follows.

2023-10-17T15:56:45.0293964Z TRACE symbolicator::endpoints::proxy: Resolving symstore proxy path `"Client.exe/65286A671152a000/Client.exe"`
2023-10-17T15:56:45.0296866Z DEBUG symbolicator::endpoints::proxy: Searching for ObjectId { code_id: Some(CodeId(65286a671152a000)), code_file: Some("Client.exe"), debug_id: None, debug_file: None, debug_checksum: None, object_type: Pe } ([Pe])
2023-10-17T15:56:45.0312159Z TRACE symbolicator::endpoints::proxy: Resolving symstore proxy path `"Client.exe/65286A671152a000/Client.ex_"`
2023-10-17T15:56:45.0324067Z TRACE symbolicator::endpoints::proxy: Resolving symstore proxy path `"Client.exe/65286A671152a000/file.ptr"`

I am using symsorter to generate my executable and debuginfo files and the Sentry web app is able to correctly symbolicate the minidump I am trying to debug in Visual Studio. Based on my limited understanding there is not enough information in the S3 bucket to process this request. Symsorter would have to generate a code_id -> debug_id mapping.

In the meantime I have a separate process that allows engineers to download executables and debuginfo files via bundle_id lookup and then use zstd to decompress to exe and pdb.

rtjonnyr commented 8 months ago

It seems that in get_search_target_id() (in paths.rs) if the filetype is FileType::Pe, as it is in this case, the code_id part of the identifier is ignored.

// PEs and PDBs are indexed by the debug id in lowercase breakpad format
// always.  This is done because code IDs by themselves are not reliable
// enough for PEs and are only useful together with the file name which
// we do not want to encode.
FileType::Pe | FileType::Pdb | FileType::PortablePdb => Some(Cow::Owned(
    identifier.debug_id?.breakpad().to_string().to_lowercase(),
)),

So I believe it is intended that code_id is not valid for lookup with a unified symbol server.

Swatinem commented 8 months ago

Yes indeed, you are right. The unified symbol server layout relies only on the debug_id. Executables have both IDs embedded in them, and those are available in minidumps as well.

The code_id itself is a very low-fidelity ID and can potentially produce collisions, that’s why the Microsoft SSQP lookup scheme is using the code_file as well to disambiguate. We do not support that in the unified symbol layout. And I doubt this is something that can be easily supported either.