KimiNewt / pyshark

Python wrapper for tshark, allowing python packet parsing using wireshark dissectors
MIT License
2.23k stars 422 forks source link

lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding ! #513

Closed vadimszzz closed 2 years ago

vadimszzz commented 2 years ago
Traceback (most recent call last):
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/x/Downloads/Projects/................./__main__.py", line 37, in <module>
    cli()
  File "/Users/x/Downloads/Projects/................./__main__.py", line 33, in cli
    raise e
  File "/Users/x/Downloads/Projects/................./__main__.py", line 23, in cli
    cli_commands()
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/Users/x/Downloads/Projects/................./cli/process.py", line 138, in run
    for pkt in cap:
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pyshark/capture/capture.py", line 240, in _packets_from_tshark_sync
    packet, data = self.eventloop.run_until_complete(
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pyshark/capture/capture.py", line 360, in _get_packet_from_stream
    packet = packet_from_xml_packet(packet, psml_structure=psml_structure)
  File "/Users/x/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pyshark/tshark/tshark_xml.py", line 26, in packet_from_xml_packet
    xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
  File "src/lxml/objectify.pyx", line 1883, in lxml.objectify.fromstring
  File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1784, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 156
lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding !
Bytes: 0xC6 0x03 0x08 0x01, line 156, column 1
vadimszzz commented 2 years ago

Always raises when i'm trying to process com.google.ios.youtube dump and sometimes with other pcaps. Files to reproduce: Archive.zip

    # 4. Decrypt the traffic with sslkeylog
    cap = pyshark.FileCapture(
        input_file=pcap_output,
        override_prefs={'tls.keylog_file': keylog_output})
    for pkt in cap: # <= lxml.etree.XMLSyntaxError: Input is not proper UTF-8, indicate encoding !
        ...