This PR changes our hybrid sharding to allow multiple replicas in a node. One benefit of this is that we can use hybrid sharding to do no sharding by having the number of replicas equal with the world size. Then we don't need to use DDP or NO_SHARD (which is getting deprecated).
This PR changes our hybrid sharding to allow multiple replicas in a node. One benefit of this is that we can use hybrid sharding to do no sharding by having the number of replicas equal with the world size. Then we don't need to use DDP or
NO_SHARD
(which is getting deprecated).