flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
768 stars 64 forks source link

feat: initial support of distributed operators #289

Closed yzh119 closed 1 month ago

yzh119 commented 1 month ago

This PR implements the attention all-reduce kernel which will be used in merging attention states from different GPUs in sequence parallelism.

We use mscclpp for collective communications, thank @liangyurain for teaching me how to use mscclpp.

Co-authored-by: Liangyu Zhao liangyu@cs.washington.edu