USC-NSL / DisagMoE

Apache License 2.0
1 stars 0 forks source link

[Enhancement] faster broadcast with sharding #7

Open hogura99 opened 8 hours ago

hogura99 commented 8 hours ago

Using NCCLBroadcast to send metadata & tensors impacts the execution performance in tensor parallelism.

Use sharding to perform pipelining broadcast: https://arxiv.org/abs/2211.05322