Open Pommaq opened 4 months ago
Thank you for this detailed issue as well. This tool was originally hacked together quickly and so this issue doesn't really surprise me. Just wanted to respond to say this may take me longer to address than the other issue since it's a bit more complicated.
No probs, I'll throw together a solution :) I think I made a neat one although it slightly changes the prints from the tool
My personal use case for this tool was to filter a very very large pcap file into smaller files representing individual TCP sessions, matching the extract command, it did succeed in doing so but I did notice a few issues that can be resolved in a relatively simple manner.
Converting linear scaling to something closer to O(1) (guesstimate)
The solution is to replace the vector with a hashmap, or a hashset, or similar. Basically something with O(1) lookup time. We can do this since "s.is_stream(&si)" simply compares StreamInfo, which is static data for the session.
Here is an example solution, note that we lose the order in which we saw each stream however, solution is to add an integer to "Stream", then just increase it each time no Stream was available in the map.
High Ram consumption
This stems from this tool retaining all seen pcap packets in an internal vector and only writing them to disk once it's extracted everything.
The solution is to not do that. In the previously proposed hashmap simply map each StreamInfo to a channel Writer. Let the reading end(s) write data into the corresponding pcap file, or performing tallying or scanning of the pcap file depending on the subcommand issued. Adapting everything to this is simple since it'd essentially mean those ends would iterate over a Receiver instead of a vector.
The reader can be implemented by:
I'd recommend the second option, it's slightly more complex but limits this tool to 2 threads (or 1 if it's implemented asyncronously but that rewrite sounds annoying).