getsentry / pdb

A parser for Microsoft PDB (Program Database) debugging information
https://docs.rs/pdb/
Apache License 2.0
368 stars 68 forks source link

Fix lines for MASM symbols #68

Closed jan-auer closed 2 years ago

jan-auer commented 4 years ago

ProcedureSymbol records for MASM functions have code ranges associated that do not match the actual code or line records.

lines_at_offset assumed that if there is a lines subsection for a symbol, it shares the exact same start offset as the symbol. However, this is not the case for MASM functions, where the symbol's offset can lie after the actual start of the instructions / line records. Since the name of lines_at_offset was chosen badly, it is now renamed to lines_for_symbol, and explicitly states that it may return line records outside of the symbol's stated code range.

A consumer of this should probably collect the line records for each symbol and infer a proper code range.

Also, the internal implementation of lines_at_offset was slow in two regards: It had linear runtime complexity, and repeatedly had to iterate through large chunks of memory to find the appropriate section. LineProgram now collects and sorts all line sections ahead of time to perform faster binary search.

Fixes #61 Fixes #63

calixteman commented 4 years ago

Have a look on this file: https://github.com/mozilla/dump_syms/blob/master/test_data/basic-opt32.pdb, in particular for proc symbols located in d:\agent_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm. There are: _local_unwind4, _unwind_handler4, _seh_longjmp_unwind4, _EH4_CallFilterFunc, _EH4_TransferToHandler, _EH4_GlobalUnwind2, _EH4_LocalUnwind:

[calixte@rouxpanda] ~/dev/mozilla/pdb$ cargo run --release --example pdb_lines ../dump_syms.calixteman/test_data/basic-opt32.pdb | rg '_local_unwind4|_unwind_handler4|_seh_longjmp_unwind4|_EH4_CallFilterFunc|_EH4_TransferToHandler| _EH4_GlobalUnwind2|_EH4_LocalUnwind' -A1
    Finished release [optimized] target(s) in 0.05s
     Running `target/release/examples/pdb_lines ../dump_syms.calixteman/test_data/basic-opt32.pdb`
+ @_EH4_CallFilterFunc@8
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ _local_unwind4
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ _seh_longjmp_unwind4
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
- _unwind_handler4
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ @_EH4_LocalUnwind@16
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ @_EH4_TransferToHandler@8
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)

The lines for each symbol start at 0xaeb0 which is wrong. So I think that the correct way to get the lines is to get the corresponding section and then get the line with the the correct offset to match the offset given in parameter. But when getting lines iterator starting at correct line for _local_unwind4, we'll get some extra lines corresponding to _seh_longjmp_unwind4. So I think lines_symbol should have a length parameter (which would be symbol length) to return an iterator on the lines where line offset is between given offset and given offset+length to be sure to get lines for the symbol. But that means that line.length computation for the last line should be done with symbol length (and not next line address).

calixteman commented 4 years ago

FYI, all the function findLines** from DIA sdk have a length argument: https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/idiasession-findlinesbyaddr?view=vs-2019 So I think it makes really sense to have the same here.

jan-auer commented 4 years ago

That's a good point. I was sort of assuming that the user of lines_at_offset might want to check manually, since it's quite easy to stop iterating. But with lines_for_symbol it makes much more sense to pass in the length as well.

Alright, let's update the tests to use the above functions and then see if we can get correct behavior. For reference, do you have the expected lines for the functions you pasted above?

Also, what if the given offset points into the middle of a line record? Do you expect the line record to be included or not? (i.e. is the first line covering the given offset or at or after the given offset)?

calixteman commented 4 years ago

Yep the output for security_check_cookie is correct. For the above functions: I get lines but as explained I get more lines than expected. About offset in the middle of line record: I'd say it shouldn't happen but who knows... So maybe return an Err in this case and let the user decides what to do in using lines for example or maybe something is wrong in the pdb...

jan-auer commented 4 years ago

This is still work in progress, unfortunately. I will have to rebase this on latest master, and then revisit the behavior based on:

For the above functions: I get lines but as explained I get more lines than expected.

jan-auer commented 2 years ago

All issues mentioned in this PR have been fixed, and we have tests for these cases now. Pending a final verification with @loewenheim, I'm going to merge this.