Open bschwind opened 1 year ago
Thinking about it a bit more, it sounds like disk IO is by far the biggest bottleneck, but the work is currently structured like so:
The CPU should be much faster at crunching the data than the disk can even provide the data, so we want to make sure requests for data from the disk are always queued up. The time you spend processing the binary data is time you're not spending queuing up more data to read. Even the two files you read here could have their reads issued in parallel to keep the disk as busy as possible (of course you'll have to profile to see if that's actually faster)
I wouldn't suggest async
right away here, but essentially you want to set up a pipeline of sorts where you throw all data fetching requests into a queue which a thread pool (or something like it) will just constantly chew through. Once a particular disk worker loads the requested data from the file, it passes it along to whatever is crunching the actual numbers, and then it grabs the next request from a queue.
This could be modeled as straight up threads, a threadpool from something like rayon, or an async pipeline running in a multi-threaded async executor like tokio. The point is to always have data read requests in flight because otherwise the kernel and disk are waiting for the next read commands.
In the "real world" I don't know how this will turn out with a spinning disk due to physical limitations, but in theory it should help reduce the serial trickle of read commands the current code is issuing. Maybe issuing too many read requests to the disk will just slow it down to a halt, so it's also worth profiling this to see if there is some optimal number of concurrent read requests to issue.
Some random thoughts for possible performance improvements (though running a profiler will always help you narrow down where things are taking time)
rayon
. It should keep the kernel busy requesting regions from files you want to read.Here are some rough calculations from the numbers I'm seeing in the sample file: