KimiNewt / pyshark

Python wrapper for tshark, allowing python packet parsing using wireshark dissectors
MIT License
2.23k stars 422 forks source link

ValueError: Invalid UTF-8 sequence length when decoding 'string' #394

Closed eppane closed 2 years ago

eppane commented 4 years ago

Hello,

I am having an issue when loading packets from a .pcap-file with FileCapture, as follows:

capture = pyshark.FileCapture(os.path.join(mydir, myfile + '.pcap'), use_json=True, include_raw=True) capture.load_packets()

Resulting in:

Traceback (most recent call last): File "pcap_bin_parser.py", line 26, in capture.load_packets() File "C:\Users\Klupi\Anaconda3\envs\tf2gpu\lib\site-packages\pyshark-0.4.2.11-py3.7.egg\pyshark\capture\capture.py", line 131, in load_packets self.apply_on_packets(keep_packet, timeout=timeout) File "C:\Users\Klupi\Anaconda3\envs\tf2gpu\lib\site-packages\pyshark-0.4.2.11-py3.7.egg\pyshark\capture\capture.py", line 267, in apply_on_packets return self.eventloop.run_until_complete(coro) File "C:\Users\Klupi\Anaconda3\envs\tf2gpu\lib\asyncio\base_events.py", line 583, in run_until_complete return future.result() File "C:\Users\Klupi\Anaconda3\envs\tf2gpu\lib\site-packages\pyshark-0.4.2.11-py3.7.egg\pyshark\capture\capture.py", line 278, in packets_from_tshark await self._go_through_packets_from_fd(tshark_process.stdout, packet_callback, packet_count=packet_count) File "C:\Users\Klupi\Anaconda3\envs\tf2gpu\lib\site-packages\pyshark-0.4.2.11-py3.7.egg\pyshark\capture\capture.py", line 295, in _go_through_packets_from_fd psml_structure=psml_struct) File "C:\Users\Klupi\Anaconda3\envs\tf2gpu\lib\site-packages\pyshark-0.4.2.11-py3.7.egg\pyshark\capture\capture.py", line 350, in _get_packet_from_stream packet = packet_from_json_packet(packet, deduplicate_fields=self._json_has_duplicate_keys) File "C:\Users\Klupi\Anaconda3\envs\tf2gpu\lib\site-packages\pyshark-0.4.2.11-py3.7.egg\pyshark\tshark\tshark_json.py", line 41, in packet_from_json_packet pkt_dict = ujson.loads(json_pkt) ValueError: Invalid UTF-8 sequence length when decoding 'string'

It seems that the ujson.loads() -method is raising the issue as the trace indicates. There is not much information available about the error in this context. Any tips how to overcome this problem?

Note: first I tried with the pyshark version which one can get via PyPi resulting in UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 32658: invalid start byte. When I looked closer at the .pcap -file, there are packets having 0xff in the beginning (11111111 bits) which is an optional BOM (Byte Order Mark). The 0xff is for UTF-16 and not for UTF-8.

eppane commented 4 years ago

Additionally, this error appeared with the same traceback while loading another .pcap-file:

ValueError: Invalid octet in UTF-8 sequence when decoding 'string'

Anyone else having similar issues?