UTF-8 support - Githubissues

davidpshaw / PyWBXMLDecoder

An ActiveSync WAP Binary XML (ASWBXML) Decoder Written in Python

MIT License

18 stars 5 forks source link

UTF-8 support #6

Closed user185953 closed 5 years ago

user185953 commented 5 years ago

Based on https://github.com/cisiqo/PyWBXML/commit/770ad4cfd18d631f889d730d8b4c4903a6fb026e The "errors='backslashreplace'" bit is there, because in mitmproxy the priority is seeing every byte

user185953 commented 5 years ago

WBXML fields are binary in general. Binary data with the highest bit set are corrupted by PyWBXMLDecoder. This patch changes the mode of corruption to preserve UTF-8 and escape non-UTF-8

iragsdale commented 5 years ago

This looks pretty good to me @davidpshaw. This is basically how we do it in Boxer.

user185953 commented 5 years ago

Thank you for the review, @iragsdale. The other commit I just pushed will be trickier. Bytes().hex() looks odd, CDATA with hex-encoded data is probably not right and output format changes. The general idea, however, looks OK?

davidpshaw commented 5 years ago

@user185953 still need to put up a PR into mitmproxy -- this project isn't brought in as a submodule there, it's duplicated.