KimiNewt / pyshark

Python wrapper for tshark, allowing python packet parsing using wireshark dissectors
MIT License
2.23k stars 422 forks source link

Capture File only_summaries=True consistently yields a single packet despite multiple packets #648

Open failuntilwin opened 1 year ago

failuntilwin commented 1 year ago

I have tried using only_summaries=True on three separate capture files both in PyCharm and Jupyter. Each time, the ingest only yields a single packet as a result. However, omitting the only_summaries=True argument yields all of the packets.

To Reproduce Steps to reproduce the behavior:

Load the pcap with only_summaries=True. I have attached the pcap as hex below.

Expected behavior 857 frames available in the capture object. However, when 'only_summaries=True` is used, only a single frame is yielded.

Versions (please complete the following information):

Example pcap / packet If applicable, add an example pcap file as an attachment, or post the packet as a hex string or a JSON/XML (export packet dissection in wireshark/tshark).

ssh_tty.txt

failuntilwin commented 1 year ago
Screen Shot 2023-04-01 at 12 13 52 PM Screen Shot 2023-04-01 at 12 14 52 PM
Vladimir-Chan commented 1 year ago

I also encountered this problem. After some effort I found the bug is located in line 25 of the file src/pyshark/tshark/output_parser/tshark_xml.py. Following is the method.

async def get_packets_from_stream(self, stream, existing_data, got_first_packet=True):
    if self._parse_summaries:
        existing_data = await self._get_psml_struct(stream)
    return await super().get_packets_from_stream(stream, existing_data, got_first_packet=got_first_packet)

The buggy code is if self._parse_summaries:. When the first time we called the get_packets_from_stream method, there was not enough XML data to create a packet. We will call the get_packets_from_stream method again, then we will call the _get_psml_struct method again because the self._parse_summaries variable is True. This time calling the _get_psml_struct method will discard existing data and set the existing_data variable to the remaining bytestring so a packet will be created which is the only packet you got. The third time we call the get_packets_from_stream method will got the EOFError and the parsing process will be over. Adding a condition to check whether the variable self._psml_structure is None is helpful to this bug, as the following:

if self._parse_summaries and self._psml_structure is None:

Because the return type of the psml_structure_from_xml method is list, it should be explicitly indicated whether the variable self._psml_structure is None.