Closed jan-auer closed 2 years ago
Have a look on this file: https://github.com/mozilla/dump_syms/blob/master/test_data/basic-opt32.pdb, in particular for proc symbols located in d:\agent_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm. There are: _local_unwind4, _unwind_handler4, _seh_longjmp_unwind4, _EH4_CallFilterFunc, _EH4_TransferToHandler, _EH4_GlobalUnwind2, _EH4_LocalUnwind:
[calixte@rouxpanda] ~/dev/mozilla/pdb$ cargo run --release --example pdb_lines ../dump_syms.calixteman/test_data/basic-opt32.pdb | rg '_local_unwind4|_unwind_handler4|_seh_longjmp_unwind4|_EH4_CallFilterFunc|_EH4_TransferToHandler| _EH4_GlobalUnwind2|_EH4_LocalUnwind' -A1
Finished release [optimized] target(s) in 0.05s
Running `target/release/examples/pdb_lines ../dump_syms.calixteman/test_data/basic-opt32.pdb`
+ @_EH4_CallFilterFunc@8
0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ _local_unwind4
0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ _seh_longjmp_unwind4
0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
- _unwind_handler4
0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ @_EH4_LocalUnwind@16
0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ @_EH4_TransferToHandler@8
0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
The lines for each symbol start at 0xaeb0 which is wrong. So I think that the correct way to get the lines is to get the corresponding section and then get the line with the the correct offset to match the offset given in parameter. But when getting lines iterator starting at correct line for _local_unwind4, we'll get some extra lines corresponding to _seh_longjmp_unwind4. So I think lines_symbol should have a length parameter (which would be symbol length) to return an iterator on the lines where line offset is between given offset and given offset+length to be sure to get lines for the symbol. But that means that line.length computation for the last line should be done with symbol length (and not next line address).
FYI, all the function findLines** from DIA sdk have a length argument: https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/idiasession-findlinesbyaddr?view=vs-2019 So I think it makes really sense to have the same here.
That's a good point. I was sort of assuming that the user of lines_at_offset
might want to check manually, since it's quite easy to stop iterating. But with lines_for_symbol
it makes much more sense to pass in the length as well.
Alright, let's update the tests to use the above functions and then see if we can get correct behavior. For reference, do you have the expected lines for the functions you pasted above?
Also, what if the given offset points into the middle of a line record? Do you expect the line record to be included or not? (i.e. is the first line covering the given offset or at or after the given offset)?
Yep the output for security_check_cookie is correct. For the above functions: I get lines but as explained I get more lines than expected. About offset in the middle of line record: I'd say it shouldn't happen but who knows... So maybe return an Err in this case and let the user decides what to do in using lines for example or maybe something is wrong in the pdb...
This is still work in progress, unfortunately. I will have to rebase this on latest master
, and then revisit the behavior based on:
For the above functions: I get lines but as explained I get more lines than expected.
All issues mentioned in this PR have been fixed, and we have tests for these cases now. Pending a final verification with @loewenheim, I'm going to merge this.
ProcedureSymbol
records for MASM functions have code ranges associated that do not match the actual code or line records.lines_at_offset
assumed that if there is a lines subsection for a symbol, it shares the exact same start offset as the symbol. However, this is not the case for MASM functions, where the symbol's offset can lie after the actual start of the instructions / line records. Since the name oflines_at_offset
was chosen badly, it is now renamed tolines_for_symbol
, and explicitly states that it may return line records outside of the symbol's stated code range.A consumer of this should probably collect the line records for each symbol and infer a proper code range.
Also, the internal implementation of
lines_at_offset
was slow in two regards: It had linear runtime complexity, and repeatedly had to iterate through large chunks of memory to find the appropriate section.LineProgram
now collects and sorts all line sections ahead of time to perform faster binary search.Fixes #61 Fixes #63