Most of the steps are compatible with shard mapping. This means we assign a shard of input data to a device, apply the kernel, with possibility to operate over shards.
77 step function must be shard-mapped.
Notes:
A shard should be 1 solution interval of freqs. Otherwise, we need to collect over other shards to get what is needed. More complicated.
Some things are clearly replicated, i.e. uvw, but we don't want to compute each time. Therefore, we should break step into two steps. One that only operates on local shards, and another that precomputes anything that should be replicated. Otherwise, work is repeated on each shard. May be unavoidable.
Most of the steps are compatible with shard mapping. This means we assign a shard of input data to a device, apply the kernel, with possibility to operate over shards.
77 step function must be shard-mapped.
Notes: