These seem to actually just be UTF-16BE strings, like in the dbserver protocol.

flesniak / python-prodj-link

A python interface to Pioneer ProDJ Link

Apache License 2.0

141 stars 26 forks source link

These seem to actually just be UTF-16BE strings, like in the dbserver protocol. #8

Closed brunchboy closed 6 years ago

brunchboy commented 6 years ago

https://github.com/flesniak/python-prodj-link/blob/4be86a86a74f1e90604c2d0e0abe2d2c2b02a238/pdblib/piostring.py#L11

brunchboy commented 6 years ago

Yes, I can confirm that parsing them as UTF-16BE works perfectly for my .pdb files. You don’t need that padded_length type either; each record type definition just needs to know where the string actually begins, which is fixed for the record type. I think the reason this may not have seemed to work for you is that your actual_length parsing assumes a 3-byte length value, when it is actually a much more ordinary 2-byte integer.

brunchboy commented 6 years ago

If you want to see what I’m talking about, you can find my Kaitai Struct parse of the format here, along with a link to their incredible Web IDE which lets you explore the parse tree side by side with a hex viewer, using your own .pdb files.

flesniak commented 6 years ago

I switched to utf-16 parsing in commit feeea94d620a9ba0357c447ce2009e279fe1724a. The reason I went for 3-byte length values is the long ascii strings, which have 3 bytes of data in front of the actual string. Thus I assumed the same amount of length bytes for utf-16 strings. If I just use 2-byte lengths for unicode strings as recommended, parsing as utf-16 works fine for all of my sample pdb files. Thanks for the comments!