amlight / ofp_sniffer

An OpenFlow sniffer to help network troubleshooting in production networks.
Apache License 2.0
14 stars 12 forks source link

pcap packet buffer timeout set to zero makes ofp_sniffer not work properly #26

Open italovalcy opened 1 year ago

italovalcy commented 1 year ago

Hi,

Running ofp_sniffer on 5.10.0-18-amd64 with default ket buffer timeout set to zero (open_live() with parameter to_ms=0 as being currently used on ofp_sniffer) makes ofp_sniffer not show anything for a long period of time (waiting for the buffer to get filled).

From pcap man page:

packet buffer timeout If, when capturing, packets are delivered as soon as they arrive, the application capturing the packets will be woken up for each packet as it arrives, and might have to make one or more calls to the operating system to fetch each packet. If, instead, packets are not delivered as soon as they arrive, but are delivered after a short delay (called a "packet buffer timeout"), more than one packet can be accumulated before the packets are delivered, so that a single wakeup would be done for multiple packets, and each set of calls made to the operating system would supply multiple packets, rather than a single packet. This reduces the per-packet CPU overhead if packets are arriving at a high rate, increasing the number of packets per second that can be captured. The packet buffer timeout is required so that an application won't wait for the operating system's capture buffer to fill up before packets are delivered; if packets are arriving slowly, that wait could take an arbitrarily long period of time. Not all platforms support a packet buffer timeout; on platforms that don't, the packet buffer timeout is ignored. A zero value for the timeout, on platforms that support a packet buffer timeout, will cause a read to wait forever to allow enough packets to arrive, with no timeout. A negative value is invalid; the result of setting the timeout to a negative value is unpredictable. NOTE: the packet buffer timeout cannot be used to cause calls that read packets to return within a limited period of time, because, on some platforms, the packet buffer timeout isn't supported, and, on other platforms, the timer doesn't start until at least one packet arrives. This means that the packet buffer timeout should NOT be used, for example, in an interactive application to allow the packet capture loop to ``poll'' for user input periodically, as there's no guarantee that a call reading packets will return after the timeout expires even if no packets have arrived. The packet buffer timeout is set with pcap_set_timeout().

Setting to_ms=100 fixes the weird behavior above.

diff --git a/libs/core/cli.py b/libs/core/cli.py
index 5f722c3..b0649d8 100644
--- a/libs/core/cli.py
+++ b/libs/core/cli.py
@@ -84,7 +84,7 @@ def start_capture(capfile, infilter, dev):
             cap = pcapy.open_offline(capfile)
         else:
             print("Sniffing device %s" % dev)
-            cap = pcapy.open_live(dev, 65536, 1, 0)
+            cap = pcapy.open_live(dev, 65536, 1, 100)

     except Exception as exception:
         print("Error: %s" % exception)
viniarck commented 1 year ago

Great find, @italovalcy.

It's very interesting that's also reduces per-packet CPU. 100 ms sounds like a great default (so maybe we don't even need to parametrize as a cli option), maybe it's worth to also trying it out on an interface with a higher OpenFlow traffic too, but probably no surprises, since this to_ms is acting like a poll without strong guarantees, and as you quoted, good thing it'll be ignored when it's not supported.

I can also confirm that on my laptop running on Linux/x86_64, with to_ms=0, it also took a long time without showing any packets, but trying out with your patch (with to_ms=100) it immediately started printing out packets to stdout:

❯ uname -a
Linux vx 6.1.31-2-MANJARO #1 SMP PREEMPT_DYNAMIC Sun Jun  4 12:31:46 UTC 2023 x86_64 GNU/Linux

I've also plot CPU/RAM of ofp_sniffer.py process usage over 2 mins live capturing on lo using mininet ring topo, CPU and RAM are stable:

ofps