KimiNewt / pyshark

Python wrapper for tshark, allowing python packet parsing using wireshark dissectors
MIT License
2.2k stars 421 forks source link

AttributeError: 'NoneType' object has no attribute 'proto' #617

Open jonaslejon opened 1 year ago

jonaslejon commented 1 year ago

Describe the bug Running pyshark on a specific pcap file makes it crash with the exception: AttributeError: 'NoneType' object has no attribute 'proto'

Full backtrace:

Traceback (most recent call last):
  File "/Users/jonasl/pcap2redis/pyshark2redis.py", line 27, in <module>
    for pkt in cap:
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/capture/capture.py", line 221, in _packets_from_tshark_sync
    packet, data = self.eventloop.run_until_complete(
  File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/tshark_xml.py", line 27, in get_packets_from_stream
    return await super().get_packets_from_stream(stream, existing_data, got_first_packet=got_first_packet)
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/base_parser.py", line 15, in get_packets_from_stream
    packet = self._parse_single_packet(packet)
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/tshark_xml.py", line 30, in _parse_single_packet
    return packet_from_xml_packet(packet, psml_structure=self._psml_structure)
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/tshark_xml.py", line 85, in packet_from_xml_packet
    return _packet_from_pdml_packet(xml_pkt)
  File "/Users/jonasl/Library/Python/3.10/lib/python/site-packages/pyshark/tshark/output_parser/tshark_xml.py", line 93, in _packet_from_pdml_packet
    layers = [XmlLayer(proto) for proto in pdml_packet.proto]
AttributeError: 'NoneType' object has no attribute 'proto'

To Reproduce Run the following code on the PCAP-file attached:

# Read PCAP file
cap = pyshark.FileCapture(sys.argv[1])

print("# Starting to read PCAP file: " + sys.argv[1])

for pkt in cap:

Expected behavior The library should not crash parsing tshark output.

Versions (please complete the following information):

Example pcap / packet The following PCAP-file can be used for testing: crash.pcap.gz

thebigdalt commented 1 year ago

To add another data point, I've encountered the exact same bug on an M1 Mac with the same pyshark and tshark versions with various PCAP files. However, I have not encountered this issue on an x64 machine running Ubuntu 22.04 with the same versions on the same capture files. My guess is it's specific to macOS and/or Apple Silicon hardware.

mahtin commented 1 year ago

I came to this issue because I have code that's crashing with the same error while listening on en0 on a MacBook Air.

I tried your pcap file with tshark version: TShark (Wireshark) 4.0.2 (v4.0.2-0-g415456d13370) on an M1 MacBook Air with BigSur 1.17.1 and sadly could not get a crash. It worked cleanly. I'm also running pyshark version: 0.5.3

(a few moments later)[1]

I went back a rev and downloaded 4.0.1 with tshark version: TShark (Wireshark) 4.0.1 (v4.0.1-0-ge9f3970b1527) and sadly I still could not get your code to crash. :( I did try it quite a few times.

Oh. Python version 3.10.8

I will try to add some specific crash info for my situation. At least I can replicate that part.

Sorry I can't help much more (for now).

[1] Spongebob

mahtin commented 1 year ago

(a few hours later)

I found the cause; but not the solution (yet). In pyshark/tshark/output_parser/tshark_xml.py these lines:

def packet_from_xml_packet(xml_pkt, psml_structure=None):
...
        xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
...

It's the call to lxml.objectify.fromstring() that returns None and then causes pure hell further down the code. There's no test for None.

I believe (in my case) that because the input xml_pkt variable contains some Unicode characters, it fails. I have a copy of the value of xml_pkt that causes this. It's attached.

The following code show fromstring() producing None from the attached file.

import lxml.objectify
with open('xml-pkt.txt', 'r') as fd:
    xml_pkt = fd.read()
parser = lxml.objectify.makeparser(huge_tree=True, recover=True)
xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
if xml_pkt == None:
    print('xml_pkt None')
else:
    print('xml_pkt len() = %d' % len(xml_pkt))

I didn't debug any further.

xml_pkt.txt

BTW: The following parts of the xml file could well be what's triggering the error (if it really is a Unicode issue):

      <field name="" show="_rdlink._tcp.local: type PTR, class IN, Li Vol4ek 🐺iPhone ._rdlink._tcp.local" size="36" pos="443" value="c00c000c00010000114f0018154c6920566f6c34656b20f09f90ba6950686f6e6520c00c">
        <field name="dns.resp.name" showname="Name: _rdlink._tcp.local" size="2" pos="443" show="_rdlink._tcp.local" value="c00c"/>
        <field name="dns.resp.type" showname="Type: PTR (domain name PoinTeR) (12)" size="2" pos="445" show="12" value="000c"/>
        <field name="dns.resp.class" showname=".000 0000 0000 0001 = Class: IN (0x0001)" size="2" pos="447" show="0x0001" value="1" unmaskedvalue="0001"/>
        <field name="dns.resp.cache_flush" showname="0... .... .... .... = Cache flush: False" size="2" pos="447" show="0" value="0" unmaskedvalue="0001"/>
        <field name="dns.resp.ttl" showname="Time to live: 4431 (1 hour, 13 minutes, 51 seconds)" size="4" pos="449" show="4431" value="0000114f"/>
        <field name="dns.resp.len" showname="Data length: 24" size="2" pos="453" show="24" value="0018"/>
        <field name="dns.ptr.domain_name" showname="Domain Name: Li Vol4ek 🐺iPhone ._rdlink._tcp.local" size="24" pos="455" show="Li Vol4ek 🐺iPhone ._rdlink._tcp.local" value="154c6920566f6c34656b20f09f90ba6950686f6e6520c00c"/>
      </field>

or:

      <field name="" show="_rdlink._tcp.local: type PTR, class IN, DN💋._rdlink._tcp.local" size="21" pos="572" value="c00c000c000100001155000906444ef09f928bc00c">
        <field name="dns.resp.name" showname="Name: _rdlink._tcp.local" size="2" pos="572" show="_rdlink._tcp.local" value="c00c"/>
        <field name="dns.resp.type" showname="Type: PTR (domain name PoinTeR) (12)" size="2" pos="574" show="12" value="000c"/>
        <field name="dns.resp.class" showname=".000 0000 0000 0001 = Class: IN (0x0001)" size="2" pos="576" show="0x0001" value="1" unmaskedvalue="0001"/>
        <field name="dns.resp.cache_flush" showname="0... .... .... .... = Cache flush: False" size="2" pos="576" show="0" value="0" unmaskedvalue="0001"/>
        <field name="dns.resp.ttl" showname="Time to live: 4437 (1 hour, 13 minutes, 57 seconds)" size="4" pos="578" show="4437" value="00001155"/>
        <field name="dns.resp.len" showname="Data length: 9" size="2" pos="582" show="9" value="0009"/>
        <field name="dns.ptr.domain_name" showname="Domain Name: DN💋._rdlink._tcp.local" size="9" pos="584" show="DN💋._rdlink._tcp.local" value="06444ef09f928bc00c"/>
      </field>
mahtin commented 1 year ago

Further testing (on a large public network - in this case an airport wifi) shows that the above crash can be triggered by mdns packets with unicode (vs ascii) name. Setting the packet filter to not udp port 5353 allows the code to run continuously without hitting this error. Once you allow mdns packets the code crashes within a few seconds.

Secondly ... In a modification to my previous code, I could stop fromstring() failing by either reading in my xml-pkt file using open(..., 'rb') or open(..., 'r', encoding='utf-8'). Maybe inside pyshark/tshark/tshark.py the subprocess() should process the output as unicode? I have not experimented with this yet.

BTW: I'm not actually snooping on a public network - that was just a way to get a lot of inbound random packets. The received packets from pyshark were sent to /dev/null, I was just waiting for the error to occur. Lucky for me there's iPhone's around here with Emoji names. :)

mahtin commented 1 year ago

(sorry - I've been traveling and hence unable to dedicate some focus time to this). However ...

Here's the fix (which must be tested in more cases than just this one):

$ git diff src/pyshark/tshark/output_parser/tshark_xml.py
diff --git a/src/pyshark/tshark/output_parser/tshark_xml.py b/src/pyshark/tshark/output_parser/tshark_xml.py
index e6f4379..b03391d 100644
--- a/src/pyshark/tshark/output_parser/tshark_xml.py
+++ b/src/pyshark/tshark/output_parser/tshark_xml.py
@@ -77,9 +77,9 @@ def packet_from_xml_packet(xml_pkt, psml_structure=None):
     :return: Packet object.
     """
     if not isinstance(xml_pkt, lxml.objectify.ObjectifiedElement):
-        parser = lxml.objectify.makeparser(huge_tree=True, recover=True)
+        parser = lxml.objectify.makeparser(huge_tree=True, recover=True, encoding='utf-8')
         xml_pkt = xml_pkt.decode(errors='ignore').translate(DEL_BAD_XML_CHARS)
-        xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
+        xml_pkt = lxml.objectify.fromstring(xml_pkt.encode('utf-8'), parser)
     if psml_structure:
         return _packet_from_psml_packet(xml_pkt, psml_structure)
     return _packet_from_pdml_packet(xml_pkt)
$

The real fix is xml_pkt.encode('utf-8') for fromstring() and the just-to-be-pedantic fix is the , encoding='utf-8' for makeparser(). These two seem to fully allow mdns packets to flow thru the code without crashing.

Once again ... I'm testing on an M1 MacPool Air.

mahtin commented 1 year ago

I've added PR #624.