ValvePython / vdf

📜 Package for working with Valve's text and binary KeyValue format
https://pypi.org/project/vdf/
MIT License
167 stars 32 forks source link

Failure parsing binary VDF fields with invalid UTF-8 strings #20

Closed Matoking closed 5 years ago

Matoking commented 5 years ago

vdf fails to parse binary VDF files which contain fields with invalid UTF-8. One such invalid field was discovered in an entry for Spore inside appinfo.vdf. The exception that occurs is:

  File "/home/matoking/git/protontricks/env/lib/python3.7/site-packages/vdf/__init__.py", line 337, in binary_loads
    stack[-1][key], idx = read_string(s, idx)
  File "/home/matoking/git/protontricks/env/lib/python3.7/site-packages/vdf/__init__.py", line 305, in read_string
    result = result.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 12: invalid start byte

I've uploaded a snippet that can be used to reproduce this error here:

https://gist.github.com/Matoking/e2dfe281386ff4eac9022eb0f02d80cd

SteamDB seems to replace the invalid character with a question mark:

https://steamdb.info/app/24720/history/ (the faulty string here is Moje Spore v�tvory)

A similar fix in vdf would mean replacing the result.decode('utf-8') call with result.decode('utf-8', errors='replace') or letting the developer decide how to handle errors by passing an optional errors kwarg.

rossengeorgiev commented 5 years ago

Yeah, PICS has these issue with malformed data. Valve doesn't seem to validate it properly on their end, and even truncate utf-8 in the middle of glyphs. The best approach would be to use replace on errors as there is not much a dev can do handle this.