exo-explore / exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
GNU General Public License v3.0
14.46k stars 775 forks source link

[BOUNTY - $500] Pipeline Parallel Inference #4

Open AlexCheema opened 4 months ago

AlexCheema commented 4 months ago

Prerequisite: https://github.com/exo-explore/exo/issues/1

Motivation: exo should use device resources as efficiently as possible. Current implementation underutilises available resources.

What: See https://pytorch.org/docs/stable/pipeline.html

Reward: $500 Bounty paid out with USDC on Ethereum, email alex@exolabs.net.

Myestery commented 4 months ago

I'll like to work on this

AlexCheema commented 3 months ago

I'll like to work on this

That would be excellent! I can help here and on Discord with any questions / issues you have.

the-alex-b commented 3 months ago

Hi there,

I was taking a look at what it would take to make this work and did some testing, found out that when you start two chat sessions and run inference at the same time they mess each other up and tokens from the two sessions bleed into each other. See the two last messages:

image

The one on the left hangs after a while, the right one finishes but is also gibberish. Does this reproduce on your end? I think fixing session isolation might precede parallel pipelining?

AlexCheema commented 3 months ago

@the-alex-b Very interesting - you're totally right, we should fix session isolation first. This makes sense since both would share the same kv caches (it's stateful). What we really need is the ability to create multiple instances of the same model that only hold the weights in memory once.

This can still be part of the same bounty.

pranav4501 commented 2 months ago

Hi @AlexCheema, Can I work on session isolation?

AlexCheema commented 2 months ago

Hi @AlexCheema, Can I work on session isolation?

Hey @pranav4501 I think @varshith15 is already working on that so best to check with him if you can contribute.

Can you also DM me on discord so we can find a good task for you. I can update bounties with something that you'd be interested to work on, as there aren't that many left now!

pranav4501 commented 2 months ago

Hi @AlexCheema, I DM'ed you on discord, I will also take a look at the stable diffusion bounty

moosh3 commented 4 days ago

Hello, can we update the GSheet to denote this is taken (if it is, which it seems to be)? cc @AlexCheema [apologies for the pings]