Rank divider for subdomain and leading dimension handling improvments.

Having the subdomain as a trailing dimension requires some special handling of the state outside of the RankDivider which has the core functionality of handling subdomains. This PR is an attempt to make the subdomain one of the leading dimensions that can be parallelized over naturally in any array multiplications reduce the overhead of special handling outside of RankDivider.

Currently this is in the form of an extra module for tests and a showcase of how it simplifies the train.py script. We can move whatever seems reasonable into domain.py after e2e testing. There was a lot of shape specific adjustments which had to be made to the tests. We might consider more granular tests with one integration test instead of the many integration tests, since it was a lot of work to update there.

Refactored public API:

RankXYDivider: a rank divider that handles dividing ranks with or without overlaps
- provides methods to decompose data into subdomains ([leading_time], subdomains, x, y, feature) and flattening/rehsaping any data that matches the trailing features

Significant internal changes:

Reservoir weight matrices (W_in and W_res) default to csc sparse matrix with 75% runtime reduction for reservoir increment step (notebook example)
Reservoir and readouts now handle leading subdomain dimension for natural matrix multiplication parallelization
The hybrid inputs and reservoir outputs have the halo points removed for training by the batch processing helper function
- The rank manipulations also are enforced using RankXYDivider with no overlap
[x] Tests added

Coverage reports (updated automatically):

ai2cm / fv3net

Rank divider for subdomain and leading dimension handling improvments. #2289