Closed brunchboy closed 6 years ago
Yes, I can confirm that parsing them as UTF-16BE works perfectly for my .pdb
files. You don’t need that padded_length
type either; each record type definition just needs to know where the string actually begins, which is fixed for the record type. I think the reason this may not have seemed to work for you is that your actual_length
parsing assumes a 3-byte length value, when it is actually a much more ordinary 2-byte integer.
If you want to see what I’m talking about, you can find my Kaitai Struct parse of the format here, along with a link to their incredible Web IDE which lets you explore the parse tree side by side with a hex viewer, using your own .pdb
files.
I switched to utf-16 parsing in commit feeea94d620a9ba0357c447ce2009e279fe1724a. The reason I went for 3-byte length values is the long ascii strings, which have 3 bytes of data in front of the actual string. Thus I assumed the same amount of length bytes for utf-16 strings. If I just use 2-byte lengths for unicode strings as recommended, parsing as utf-16 works fine for all of my sample pdb files. Thanks for the comments!
https://github.com/flesniak/python-prodj-link/blob/4be86a86a74f1e90604c2d0e0abe2d2c2b02a238/pdblib/piostring.py#L11