getsentry / pdb

A parser for Microsoft PDB (Program Database) debugging information
https://docs.rs/pdb/
Apache License 2.0
384 stars 69 forks source link

NGEN-generated PDB support #153

Open vvuk opened 5 months ago

vvuk commented 5 months ago

I'm working to improve some profiling tools (samply specifically) that uses the pdb create under the hood. Part of what I need is being able to handle symbols for .NET, specifically symbols from Crossgen2-built Ready 2 Run assemblies. I think that's where things are coming from anyway, anyway -- things that end in .ni.pdb, I think written here with some comments about DiaSymReader and other: https://github.com/dotnet/runtime/blob/fc76b1cac3f02cc9729f6682d6850fd7982e9fe5/src/coreclr/tools/aot/ILCompiler.Diagnostics/PdbWriter.cs#L199

Here's an example of this type of PDB, from Microsoft's symbol server: dotnet.ni.dll Also just in case, this isn't a Portable PDB, it's a normal PDB, but I think written in a very limited way. It's just the symbol information.

When read by the pdb create, these pdbs show up as having no section information. Which means it can't get an address map, which means that I end up with no way of translating RVA addresses to symbols. Section contribution information is there though, e.g. here's dia2dump -x:

*** SECTION CONTRIBUTION

    RVA        Address       Size    Module
  00001000  0001:00000000  00275000  C:\Users\cloudtest\AppData\Local\Temp\5egk1fj3.cpa\dotnet.dll
  00276000  0002:00000000  00034000  C:\Users\cloudtest\AppData\Local\Temp\5egk1fj3.cpa\dotnet.dll
  002AA000  0003:00000000  00004000  C:\Users\cloudtest\AppData\Local\Temp\5egk1fj3.cpa\dotnet.dll

These section contributions map directly to the 3 sections in the actual code dotnet.dll. I have no idea where dia2dump is getting the RVA from above, as it's not in the section contrib information. I do see in PdbWriter.cs some places where sections are written, but I have no idea where that info is going!

In case it's useful, there's one module in this PDB (again from dia2dump):

** Module: C:\Users\cloudtest\AppData\Local\Temp\5egk1fj3.cpa\dotnet.dll

CompilandDetails:
        Language: MSIL
        Target processor: ARM64
        Compiled for edit and continue: no
        Compiled without debugging info: no
        Compiled with LTCG: no
        Compiled with /bzalign: no
        Managed code present: no
        Compiled with /GS: no
        Compiled with /sdl: no
        Compiled with /hotpatch: no
        Converted by CVTCIL: no
        MSIL module: no
        Frontend Version: Major = 0, Minor = 0, Build = 0, QFE = 0
        Backend Version: Major = 8, Minor = 0, Build = 424, QFE = 16909
        Version string: Crossgen2 - 8.0.4+2d7eea252964e69be94cb9c847b371b23e4dd470
vvuk commented 5 months ago

DBIExtraStreams from pdb.extra_streams() is just full of None here.

JustasMasiulis commented 5 months ago

I have no idea where dia2dump is getting the RVA from above

https://github.com/getsentry/pdb/issues/17#issuecomment-2055784958

DIA uses the section map/OMF segment map (same thing, different names in different sources) to aid the translation. Section headers are not necessary to do the translation and this library simply doesn't implement the address translation this way.

vvuk commented 5 months ago

Ah ha! I just made my way there, but was trying to figure out how to use that data. Sounds like I'm on the right track, at least for a limited use case.

vvuk commented 5 months ago

Hmm, could maybe use another hint here @JustasMasiulis :) In this PDB, there isn't any omap data. So all I've got is the section_map.

DebugInformation { stream: Stream { source_view: ReadView(421 bytes) }, header:
 DBIHeader { signature: 4294967295, version: V70,
    age: 1, gs_symbols_stream: StreamIndex(8), internal_version: 36390,
    ps_symbols_stream: StreamIndex(9), pdb_dll_build_version: 33135,
    symbol_records_stream: StreamIndex(10), pdb_dll_rbld_version: 0,
    module_list_size: 140, section_contribution_size: 88,
    section_map_size: 84,
    file_info_size: 20, type_server_map_size: 0, mfc_type_server_index: 0, debug_header_size: 0, ec_substream_size: 25, flags: 0, machine_type: 0, reserved: 0 }, header_len: 64 }
// debug_header_size is 0, but just in case:
DBIExtraStreams { fpo: StreamIndex(None), exception: StreamIndex(None), fixup: StreamIndex(None), omap_to_src: StreamIndex(None), omap_from_src: StreamIndex(None), section_headers: StreamIndex(None), token_rid_map: StreamIndex(None), xdata: StreamIndex(None), pdata: StreamIndex(None), framedata: StreamIndex(None), original_section_headers: StreamIndex(None) }

if I parse the section_map as an OMFSegMapDesc (roughly from microsoft-pdb), I get this:

sec_count: 4, sec_count_log: 4
OMFSegMapDesc { flags: 269, ovl: 0, group: 0, frame: 1, seg_name_index: 65535, class_name_index: 65535, offset: 0, size: 2576384 }
OMFSegMapDesc { flags: 269, ovl: 0, group: 0, frame: 2, seg_name_index: 65535, class_name_index: 65535, offset: 0, size: 212992 }
OMFSegMapDesc { flags: 269, ovl: 0, group: 0, frame: 3, seg_name_index: 65535, class_name_index: 65535, offset: 0, size: 16384 }
OMFSegMapDesc { flags: 520, ovl: 0, group: 0, frame: 0, seg_name_index: 65535, class_name_index: 65535, offset: 0, size: 4294967295 }

If I parse it as a DbiSectionMap from syzygy I get:

DBISectionMapItem { flags: 13, section_type: 1, unknown_data_1: 0, section_number: 1, unknown_data_2: 4294967295, rva_offset: 0, section_length: 2576384 }
DBISectionMapItem { flags: 13, section_type: 1, unknown_data_1: 0, section_number: 2, unknown_data_2: 4294967295, rva_offset: 0, section_length: 212992 }
DBISectionMapItem { flags: 13, section_type: 1, unknown_data_1: 0, section_number: 3, unknown_data_2: 4294967295, rva_offset: 0, section_length: 16384 }
DBISectionMapItem { flags: 8, section_type: 2, unknown_data_1: 0, section_number: 0, unknown_data_2: 4294967295, rva_offset: 0, section_length: 4294967295 }

DbiSectionMap packs flags/section_type into the 16-bit flags OMFSegMapDesc field, ok. But rva_offset is still 0 here. What am I missing?

JustasMasiulis commented 5 months ago

But rva_offset is still 0 here.

That is correct and this value is used as it is.

Hmm, could maybe use another hint here

For your specific PDB the segment frame is always 1, so there will be no section RVA "synthesis" (which is needed when there are no section headers) beyond adding 0x1000 (since there is no OMAP from) and your rva_offset (which is 0) to the symbol.offset

vvuk commented 5 months ago

For your specific PDB the segment frame is always 1,

Hm how do I know this? (and apologies, I'm still figuring out all the PDB details, so I'm not 100% familiar what the "segment frame" is -- equivalent to the section here? And thank you for your help!)

beyond adding 0x1000 (since there is no OMAP from) and your rva_offset (which is 0) to the symbol.offset Ok, so 0x1000 is assumed if there is no other information (+ the rva_offset from the section map)? What about the other two section map entries?

All the public symbols do fit within the first section's range, so moot point here, but e.g. where is e.g. 002AA000 coming from for the third entry in the contributions map?

I hacked in a version of this in the crate that turns out I'm actually using (so many pdbs) in samply; thanks for your help.

vvuk commented 5 months ago

(Also to be clear, happy to do a PR for this upstream version of the crate as well if there's interest)

JustasMasiulis commented 5 months ago

For your specific PDB the segment frame is always 1,

Hm how do I know this? (and apologies, I'm still figuring out all the PDB details, so I'm not 100% familiar what the "segment frame" is -- equivalent to the section here? And thank you for your help!)

OMFSegMapDesc.frame from one of your previous samples. I wasn't clear about this, but I was looking only at symbols and their address translation.

beyond adding 0x1000 (since there is no OMAP from) and your rva_offset (which is 0) to the symbol.offset Ok, so 0x1000 is assumed if there is no other information (+ the rva_offset from the section map)? What about the other two section map entries?

All the public symbols do fit within the first section's range, so moot point here, but e.g. where is e.g. 002AA000 coming from for the third entry in the contributions map?

Both the second and third entries refer to frame > 1 and need extra work beyond just adding 0x1000 to synthesize. You need to add sum of sizes of preceding OMFSegMapDesc entries.