Open jonaslejon opened 1 year ago
To add another data point, I've encountered the exact same bug on an M1 Mac with the same pyshark
and tshark
versions with various PCAP files. However, I have not encountered this issue on an x64 machine running Ubuntu 22.04 with the same versions on the same capture files. My guess is it's specific to macOS and/or Apple Silicon hardware.
I came to this issue because I have code that's crashing with the same error while listening on en0
on a MacBook Air.
I tried your pcap file with tshark
version: TShark (Wireshark) 4.0.2 (v4.0.2-0-g415456d13370)
on an M1 MacBook Air with BigSur 1.17.1
and sadly could not get a crash. It worked cleanly. I'm also running pyshark version: 0.5.3
(a few moments later)[1]
I went back a rev and downloaded 4.0.1
with tshark
version: TShark (Wireshark) 4.0.1 (v4.0.1-0-ge9f3970b1527)
and sadly I still could not get your code to crash. :( I did try it quite a few times.
Oh. Python version 3.10.8
I will try to add some specific crash info for my situation. At least I can replicate that part.
Sorry I can't help much more (for now).
[1] Spongebob
(a few hours later)
I found the cause; but not the solution (yet). In pyshark/tshark/output_parser/tshark_xml.py
these lines:
def packet_from_xml_packet(xml_pkt, psml_structure=None):
...
xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
...
It's the call to lxml.objectify.fromstring()
that returns None
and then causes pure hell further down the code. There's no test for None
.
I believe (in my case) that because the input xml_pkt
variable contains some Unicode characters, it fails. I have a copy of the value of xml_pkt
that causes this. It's attached.
The following code show fromstring()
producing None
from the attached file.
import lxml.objectify
with open('xml-pkt.txt', 'r') as fd:
xml_pkt = fd.read()
parser = lxml.objectify.makeparser(huge_tree=True, recover=True)
xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
if xml_pkt == None:
print('xml_pkt None')
else:
print('xml_pkt len() = %d' % len(xml_pkt))
I didn't debug any further.
BTW: The following parts of the xml file could well be what's triggering the error (if it really is a Unicode issue):
<field name="" show="_rdlink._tcp.local: type PTR, class IN, Li Vol4ek 🐺iPhone ._rdlink._tcp.local" size="36" pos="443" value="c00c000c00010000114f0018154c6920566f6c34656b20f09f90ba6950686f6e6520c00c">
<field name="dns.resp.name" showname="Name: _rdlink._tcp.local" size="2" pos="443" show="_rdlink._tcp.local" value="c00c"/>
<field name="dns.resp.type" showname="Type: PTR (domain name PoinTeR) (12)" size="2" pos="445" show="12" value="000c"/>
<field name="dns.resp.class" showname=".000 0000 0000 0001 = Class: IN (0x0001)" size="2" pos="447" show="0x0001" value="1" unmaskedvalue="0001"/>
<field name="dns.resp.cache_flush" showname="0... .... .... .... = Cache flush: False" size="2" pos="447" show="0" value="0" unmaskedvalue="0001"/>
<field name="dns.resp.ttl" showname="Time to live: 4431 (1 hour, 13 minutes, 51 seconds)" size="4" pos="449" show="4431" value="0000114f"/>
<field name="dns.resp.len" showname="Data length: 24" size="2" pos="453" show="24" value="0018"/>
<field name="dns.ptr.domain_name" showname="Domain Name: Li Vol4ek 🐺iPhone ._rdlink._tcp.local" size="24" pos="455" show="Li Vol4ek 🐺iPhone ._rdlink._tcp.local" value="154c6920566f6c34656b20f09f90ba6950686f6e6520c00c"/>
</field>
or:
<field name="" show="_rdlink._tcp.local: type PTR, class IN, DN💋._rdlink._tcp.local" size="21" pos="572" value="c00c000c000100001155000906444ef09f928bc00c">
<field name="dns.resp.name" showname="Name: _rdlink._tcp.local" size="2" pos="572" show="_rdlink._tcp.local" value="c00c"/>
<field name="dns.resp.type" showname="Type: PTR (domain name PoinTeR) (12)" size="2" pos="574" show="12" value="000c"/>
<field name="dns.resp.class" showname=".000 0000 0000 0001 = Class: IN (0x0001)" size="2" pos="576" show="0x0001" value="1" unmaskedvalue="0001"/>
<field name="dns.resp.cache_flush" showname="0... .... .... .... = Cache flush: False" size="2" pos="576" show="0" value="0" unmaskedvalue="0001"/>
<field name="dns.resp.ttl" showname="Time to live: 4437 (1 hour, 13 minutes, 57 seconds)" size="4" pos="578" show="4437" value="00001155"/>
<field name="dns.resp.len" showname="Data length: 9" size="2" pos="582" show="9" value="0009"/>
<field name="dns.ptr.domain_name" showname="Domain Name: DN💋._rdlink._tcp.local" size="9" pos="584" show="DN💋._rdlink._tcp.local" value="06444ef09f928bc00c"/>
</field>
Further testing (on a large public network - in this case an airport wifi) shows that the above crash can be triggered by mdns
packets with unicode (vs ascii) name. Setting the packet filter to not udp port 5353
allows the code to run continuously without hitting this error. Once you allow mdns
packets the code crashes within a few seconds.
Secondly ... In a modification to my previous code, I could stop fromstring()
failing by either reading in my xml-pkt file using open(..., 'rb')
or open(..., 'r', encoding='utf-8')
. Maybe inside pyshark/tshark/tshark.py
the subprocess()
should process the output as unicode? I have not experimented with this yet.
BTW: I'm not actually snooping on a public network - that was just a way to get a lot of inbound random packets. The received packets from pyshark were sent to /dev/null
, I was just waiting for the error to occur. Lucky for me there's iPhone's around here with Emoji names. :)
(sorry - I've been traveling and hence unable to dedicate some focus time to this). However ...
Here's the fix (which must be tested in more cases than just this one):
$ git diff src/pyshark/tshark/output_parser/tshark_xml.py
diff --git a/src/pyshark/tshark/output_parser/tshark_xml.py b/src/pyshark/tshark/output_parser/tshark_xml.py
index e6f4379..b03391d 100644
--- a/src/pyshark/tshark/output_parser/tshark_xml.py
+++ b/src/pyshark/tshark/output_parser/tshark_xml.py
@@ -77,9 +77,9 @@ def packet_from_xml_packet(xml_pkt, psml_structure=None):
:return: Packet object.
"""
if not isinstance(xml_pkt, lxml.objectify.ObjectifiedElement):
- parser = lxml.objectify.makeparser(huge_tree=True, recover=True)
+ parser = lxml.objectify.makeparser(huge_tree=True, recover=True, encoding='utf-8')
xml_pkt = xml_pkt.decode(errors='ignore').translate(DEL_BAD_XML_CHARS)
- xml_pkt = lxml.objectify.fromstring(xml_pkt, parser)
+ xml_pkt = lxml.objectify.fromstring(xml_pkt.encode('utf-8'), parser)
if psml_structure:
return _packet_from_psml_packet(xml_pkt, psml_structure)
return _packet_from_pdml_packet(xml_pkt)
$
The real fix is xml_pkt.encode('utf-8')
for fromstring()
and the just-to-be-pedantic fix is the , encoding='utf-8'
for makeparser()
. These two seem to fully allow mdns
packets to flow thru the code without crashing.
Once again ... I'm testing on an M1 MacPool Air.
I've added PR #624.
Describe the bug Running pyshark on a specific pcap file makes it crash with the exception:
AttributeError: 'NoneType' object has no attribute 'proto'
Full backtrace:
To Reproduce Run the following code on the PCAP-file attached:
Expected behavior The library should not crash parsing tshark output.
Versions (please complete the following information):
Example pcap / packet The following PCAP-file can be used for testing: crash.pcap.gz