Allow distributed edge splitting over out_dim

Description

Adds dist_split_over: Literal["out_dim", "dataset"] = "dataset" to rib_build config, indicating whether to split over out_dim or dataset when calculating edges.
out_dim splitting is supported for functional and stochastic edges
Removes the unused "skip_ci" flag in conftest.
Related Issue

Motivation and Context

Splitting over the dataset is not possible for large runs that only use a few n_ctx length samples, because there aren't enough samples to split over efficiently. By instead splitting over out_dim, we can distribute the computation much more effectively.

How Has This Been Tested?

Added the following tests for modadd with dist_split_over="out_dim", which mirror the structure of the existing modadd tests which split over the dataset:

Add test for squared edge formula without stochastic sources
Add test for squared edge formula with 3 stochastic sources

NOTE: Multiple distributed tests cannot be run in the test suite without breaking things. Tests with mpi must be run in separate process. I've thus added a tests/run_distributed_tests.sh script for running each mpi test in a separate process, and added a --runmpi flag which, unless given, will not run mpi tests.

ApolloResearch / rib