issues
search
DeepLink-org
/
deeplink.framework
BSD 3-Clause "New" or "Revised" License
56
stars
28
forks
source link
[DIPU] Implement all_to_all_single (with unequal splits) and all_to_all
#842
Closed
jfxu-st
closed
2 months ago
jfxu-st
commented
2 months ago
Done
Implement all_to_all_single with equal splits
Implement all_to_all
Leave to the future
For cuda, use nccl group calls for all_to_all_single and all_to_all (We need to design some new APIs to support group calls)
Implement a more performant fallback for all_to_all without using a flattened tensor for relay