flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving
https://flexflow.readthedocs.io
Apache License 2.0
1.59k stars 218 forks source link

FlexFlow should correctly use concurrent tasks #1378

Open rohany opened 2 months ago

rohany commented 2 months ago

Flexflow is currently protecting all tasks with NCCL with an execution fence, which is correct, but unnecessary. Concurrent tasks do not need a fence, but must register themselves as concurrent, and use the concurrent barrier before and after the allreduce. This enables some overlap of the NCCL operations with other GPU operations without violating NCCL rules.

See some of these commits for details: https://github.com/rohany/FlexFlow/commit/cc3f0e00134b0a39c9f667b4f98f0f1250d10e89 https://github.com/rohany/FlexFlow/commit/ed08e74a4c3c285d09b8bf3a74fcba6728f4c05a https://github.com/rohany/FlexFlow/commit/2c1ceb2b34f2d3813a767d4c279f59e28b7e6e56

jiazhihao commented 2 months ago

@rohany Thanks for the info! We will change to use concurrent tasks. To questsions:

rohany commented 2 months ago

Why you added a checkCUDA(cudaDeviceSynchronize()) after runtime->concurrent_task_barrier(ctx)?

I think that this is unnecessary, i had added it a while ago and never ended up removing it. The ground truth for what to do is Mike's example here: https://gitlab.com/StanfordLegion/legion/-/blob/master/examples/nccl/nccl_legion_demo.cu?ref_type=heads.

Were you able to resolve the hanging issue after the changes?

yes, i was able to scale to 32 GPUs without hangs.