Multiple nodes do not speed up inference on large models

exo-explore / exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

GNU General Public License v3.0

10.99k stars 639 forks source link

Multiple nodes do not speed up inference on large models #141

Open wzLLM opened 2 months ago

wzLLM commented 2 months ago

Hello, thank you very much for open-sourcing such an excellent project. I am currently encountering a problem. When I use two Mac computers to run a large model, the speed of inference is no different from that of using one Mac computer. I would like to ask how to solve this problem. Thank you very much.

AlexCheema commented 2 months ago

Currently exo does pipeline parallel inference which is faster compared to offloading when a single device can't fit the entire model. If a single device can fit the entire model, then just do that -- no need for exo.

We're working on other kinds of parallelism that will improve speed of inference as you add more devices.

wzLLM commented 2 months ago

Got it, thank you for your answer.