Open hogura99 opened 8 hours ago
Using NCCLBroadcast to send metadata & tensors impacts the execution performance in tensor parallelism.
Use sharding to perform pipelining broadcast: https://arxiv.org/abs/2211.05322
Using NCCLBroadcast to send metadata & tensors impacts the execution performance in tensor parallelism.
Use sharding to perform pipelining broadcast: https://arxiv.org/abs/2211.05322