Monstrofil / replays_unpack

51 stars 19 forks source link

LXML Etree parser causing decoding error #7

Closed jcw780 closed 4 years ago

jcw780 commented 4 years ago

xml = etree.parse(f, parser=etree.XMLParser(encoding='utf8', remove_comments=True)) File "src\lxml\etree.pyx", line 3521, in lxml.etree.parse File "src\lxml\parser.pxi", line 1880, in lxml.etree._parseDocument File "src\lxml\parser.pxi", line 1900, in lxml.etree._parseFilelikeDocument File "src\lxml\parser.pxi", line 1795, in lxml.etree._parseDocFromFilelike File "src\lxml\parser.pxi", line 1201, in lxml.etree._BaseParser._parseDocFromFilelike File "src\lxml\parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src\lxml\parser.pxi", line 721, in lxml.etree._handleParseResult File "src\lxml\etree.pyx", line 318, in lxml.etree._ExceptionContext._raise_if_stored File "src\lxml\parser.pxi", line 370, in lxml.etree._FileReaderContext.copyToBuffer File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 2102: character maps to

This happens when replay_unpack/core/entity_def/data_types/init.py reads alias.xml from the selected version Also apparently changing 'r' to 'rb' in this line: https://github.com/Monstrofil/replays_unpack/blob/d84de65860803cc5cc5728952adf0794e46643e3/replay_unpack/core/entity_def/data_types/__init__.py#L83 fixes the problem along with setting the encoding.

Monstrofil commented 4 years ago

Hi, I'll check it, but can you please also attach xml file that causes this problem?

jcw780 commented 4 years ago

it's this: https://github.com/Monstrofil/replays_unpack/blob/master/replay_unpack/clients/wows/versions/0_9_7/scripts/entity_defs/alias.xml

Monstrofil commented 4 years ago

Merged, thanks