feat req: Scalable obs sequences

Currently every processor in DART reads the entire observation sequence into memory.

total memory = obs_seq_size * num_procs

Fig from Kamil Yousuf: Screenshot 2024-09-30 at 10 34 14 AM

In addition, the obs sequence reads and writes are single processor, which anti-scales.

This is no longer sufficient

increasing number of observations (e.g. satellite obs)
high resolution DA (large core count, wasting cycles with singe core IO)
particularly when obs sequence contain external forward operators (more per core memory). Side note: the obs sequence is maybe not the place to read/write external FOs, but that is the current design.
AI models (may want these to be subroutine callable and run many windows in one filter run)

Kamil Yousuf, Rhodes College SiParCS worked on reading obs sequences for multiple time windows: ~1/2 billion observations read and distributed. Kamil also has as parallel sort, and is working on parallel writes. Kamil is assuming that the observation length is calculable (calculatable?, predictable), which is not guaranteed in general currently (but can be). Kamil's fork https://github.com/tyiop794/DART (also has obs seq test harness)

Folder in Specs for Obs_Seq_IO

NCAR / DART

feat req: Scalable obs sequences #745