Implemented monomorphic functions / methods for IPv4 / IPv6 handling to allow process flow separation once instead of continuously having to if-then-else
Removed duplicate information from flows, now all hash information is carried by the key (value now basically simply is a types.Counters) and handed over as string (which is uses for compiler optimizations in map access anyway and can be copied to the hashmap during rotation a little faster)
Revamped global packet buffer (used during flow map rotation) to distinguish between IPv4 / IPv6 (and sped up Put / Get operations in the process), allowing it to carry more than twice the flows for the same allocated memory in standard scenarios (i.e. when IPv4 is dominant) and never less than before, even if all traffic were IPv6
Removed concept of MaybeRemains and MaybeReverts for direction detection (was a nice idea, but factually all rules determine the direction from the first observed packet, so the additional tracking / overhead is simply not required - but could be added later again should we ever come across more complex heuristics)
Added heuristic to "guess" if a packet is a "request" or "response" packet and based on that choose the most probable hash map lookup path (to minimize the number of cases where we have to check twice)
Micro-optimizations:
(Brute) Force inlined a few methods / functions to reduce call graph depth (but at the same time put some stuff into functions that can be inlined): Call graph now has a maximum depth of 2 beyond the capture loop itself, everything else is inlined.
Replaced common port logic by array-based lookup table to achieve constant-time lookups (makes readability & extension much easier, see below) that are almost as good as the best case scenario before, most of the times significantly better
Changed memory alignment of EPHash (both new IPv4 and IPv6 versions) to allow copying / transferring / reversing them with fewer operations (due to contiguous memory areas that can be copied in one go)
Avoid conversions for ports and operate directly on their least and most significant bytes to speed up operations
Misc:
Added 445/TCP and 8080/TCP to list of common (destination) ports that are pre-aggegated (based on their abundance in productive environments) - Note: This will be on top of the posted benchmarks here and OSAG internal (didn't want to skew the measurements - reducing the cardinality obviously makes stuff faster) !! :wink:
Added Prometheus gauge tracking the relative usage of the global packet buffer (per interface) during rotation
Improved tests & benchmarks for coverage and meaningfulness
Support benchmarks from PCAP file data (will be used to fill non-draining memory buffer up to its capacity, then replay over and over)
Updated all upstream dependencies to address CVEs
Result summary:
Faster read-analyze-store cycle for each packet read from the wire (mostly due to micro-optimizations across the board), anything between few to almost 50% depending on scenario (see benchmarks in #284).
More buffer for your bytes :wink: - in real-life the buffer can now carry something like twice as many packets during rotation without the need to allocate / provide more memory (in addition to being able populate & drain the data more quickly)
Slightly faster rotation (mostly due to less data being processed slightly more efficiently, a combination of the above fundamental and micro-optimizations)
Practically (tested under heavy production load), CPU usage is reduce by around 20% and packet drops were reduced from several 100k (1h) to zero (of course that's just a single scenario, but it shows the potential)
Major steps undertaken:
Fundamental changes:
if-then-else
types.Counters
) and handed over as string (which is uses for compiler optimizations in map access anyway and can be copied to thehashmap
during rotation a little faster)MaybeRemains
andMaybeReverts
for direction detection (was a nice idea, but factually all rules determine the direction from the first observed packet, so the additional tracking / overhead is simply not required - but could be added later again should we ever come across more complex heuristics)Micro-optimizations:
EPHash
(both new IPv4 and IPv6 versions) to allow copying / transferring / reversing them with fewer operations (due to contiguous memory areas that can be copied in one go)Misc:
445/TCP
and8080/TCP
to list of common (destination) ports that are pre-aggegated (based on their abundance in productive environments) - Note: This will be on top of the posted benchmarks here and OSAG internal (didn't want to skew the measurements - reducing the cardinality obviously makes stuff faster) !! :wink:Result summary:
Closes #284