Stream reading - Githubissues

FredeJ commented 3 years ago

I've implemented a read_stream method for reading a file as a stream, rather than reading the full file into memory immediately.

The reason for doing this is that I want to enable reading from a FIFO. That way, I can do mkfifo /dev/fifo and tcpdump ... -w /dev/fifo and then build a python application that continuously reads from that FIFO. I can then use this python application to pipe converted data into where ever I need it.

I believe there might be some things missing in the code: specifically, everything in the calculate_size function I'd like some feedback on - why is it necessary and what does it do? Does it just filter when we get bad packets?

Gi-z commented 3 years ago

Hi, thanks for this PR!

I'm only really able to devote time to features I'm personally working with, so realtime parsing hasn't been a huge focus of mine. That said, I'm obviously very interested and would definitely like to get this merged.

Allow me to explain what's going on with calculate_size. Once we've parsed a pcap, the actual contents can be in several different states.

They might be None, indicating that that the buffer is either invalid or terminates early.
The size of the payload isn't listed in the valid sizes list (for all bandwidths).
The size of the payload is inconsistent with other observed sizes in the stream.

Check 2 is important as there'd be no point attempting to parse a CSI payload if we couldn't expect it to parse to a familiar bandwidth/subcarrier count. Check 3 aims to deal with inconsistent bandwidths. Since the goal of CSIKit is to be able to assemble a CSI matrix (frames subcarriers rx * tx), an inconsistent number of subcarriers would not fit in this matrix. On the first parsed frame, an expected bandwidth is established so inconsistent frames can be discarded. For most use cases, this should be sufficient. In cases where CSI might be expected from multiple sources, that might not be the best course of action.

As for getting this merged, the implementation is looking good however I'd like to consider the performance impact of switching the whole parser to stream reading. With a very rough test using PowerShell's Measure-Command on my system with nexmon\example_4366c0_4x4mimo.pcap from the repo, I'm seeing it's taking ~2.7s to run on current master, while it's roughly ~5.7s on the PR. When originally working on the read_bfee implementation, I noticed a similar performance dropoff when reading the file bytes at a time. There may be a better approach to this, but I understand that you've got a specific use-case in mind.

If you're happy with the performance in its current state, I'd be happy to merge this functionality alongside (rather than replacing) the existing implementation. We do have a few good options in considering further optimisations too, given overhead is precious in realtime applications. You might notice I've experimented with numba's JIT compiler in byteops which yields a significant performance increase with virtually no code changes.

Since I'm requesting these changes, I'd be happy to put some work into this myself. Thanks again, and I hope we can get this merged!

FredeJ commented 3 years ago

It's not completely clear how I can reproduce the results you're seeing. I've tried running pytest test_nexmon.py with no luck. I have ideas for how to potentially improve performance, but I'd like to be able to measure it :)

If you're happy with the current state, a merge would be appreciated. That would make it a lot easier for me to deploy things to my application, as I would just be able to use pip install in that case! :)

Gi-z commented 3 years ago

Not sure what happened with those performance tests last night. As of just now, I don't seem to be able to observe a performance deficit between the two versions, apologies. I'm going to merge this just now. Thanks again. Do let me know how your streamed application performs!

Gi-z commented 3 years ago

Just finished resolving some issues that cropped up during the merge. Should be pushing a new version live on pypi shortly, just got another feature I'm aiming to include.

FredeJ commented 3 years ago

@Gi-z Any news on pushing to pypi? :) I'd love to just be able to retrieve the version I need from there.

Gi-z commented 3 years ago

@Gi-z Any news on pushing to pypi? :) I'd love to just be able to retrieve the version I need from there.

Is live on PyPi now. Thanks again.

Gi-z / CSIKit

Stream reading #16