Pass the global_config to MeshHostWorker's constructor. This allows a user to specify nccl related configs and more;
Set manual sharding of intermediate tensors. Prior to this, we can only set data parallelism for intermediate stages with global inputs like attention_mask.
global_config
toMeshHostWorker
's constructor. This allows a user to specify nccl related configs and more;attention_mask
.