Performance improvements

Some random thoughts for possible performance improvements (though running a profiler will always help you narrow down where things are taking time)

Consider using memory-mapped IO for reading from the binary files. This should be a good crate to use for such a purpose
If you know in advance which pieces of data you will access, it might be good to use the advise call on the memory-mapped data to tell the kernel where you will be reading from next. There is also advise_range
Since data is read from a spinning disk, you might benefit from sorting the reads you want to do into a sequential fashion so you can benefit more from the disk cache
Introducing multithreading should certainly help speed things up and saturate the disk IO. I see you have some experimental inclusion of rayon. It should keep the kernel busy requesting regions from files you want to read.
A simpler approach might be to modify your existing code when reading from a file to just do one seek and read per file - calculate the index of the very first byte of data you need, and the very last byte of data, and read that entire chunk to a slice. This will reduce system calls and disk seeking. Once you have that slice in memory you can extract just the bytes you want. It's a tradeoff though of reading more data than you need vs. making more system calls and having the disk head move around everywhere.

Here are some rough calculations from the numbers I'm seeing in the sample file:

55 sites
20 years per site
46 days per year
13 datasets per day
2 file reads per dataset

= 55 * 20 * 46 * 13 * 2
= 1,315,600 file reads

Thinking about it a bit more, it sounds like disk IO is by far the biggest bottleneck, but the work is currently structured like so:

Read some data from a CSV, do some light processing
Request data from binary file 1, wait quite some time
Request data from binary file 2, wait quite some time
Do a bit of processing from the binary file data to produce a record
Write the record to a CSV file
Repeat

The CPU should be much faster at crunching the data than the disk can even provide the data, so we want to make sure requests for data from the disk are always queued up. The time you spend processing the binary data is time you're not spending queuing up more data to read. Even the two files you read here could have their reads issued in parallel to keep the disk as busy as possible (of course you'll have to profile to see if that's actually faster)

I wouldn't suggest async right away here, but essentially you want to set up a pipeline of sorts where you throw all data fetching requests into a queue which a thread pool (or something like it) will just constantly chew through. Once a particular disk worker loads the requested data from the file, it passes it along to whatever is crunching the actual numbers, and then it grabs the next request from a queue.

This could be modeled as straight up threads, a threadpool from something like rayon, or an async pipeline running in a multi-threaded async executor like tokio. The point is to always have data read requests in flight because otherwise the kernel and disk are waiting for the next read commands.

In the "real world" I don't know how this will turn out with a spinning disk due to physical limitations, but in theory it should help reduce the serial trickle of read commands the current code is issuing. Maybe issuing too many read requests to the disk will just slow it down to a halt, so it's also worth profiling this to see if there is some optimal number of concurrent read requests to issue.

biskwikman / asia-flux-modis

Performance improvements #3