b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

What about mobile phones? #89

Open dcale opened 3 weeks ago

dcale commented 3 weeks ago

At Acurast we're creating a distributed network of mobile devices that get dedicated to the network as compute units. The network currently counts 5000 mobile devices and it would be great to use distributed lama on those devices and benchmark the limits. Has someone tried to run distributed lama on Android/iOS?

b4rtaz commented 3 weeks ago

@dcale Distributed Llama supports ARM CPUs so I belive it should work (maybe it would require some adjusments).

I'm not aware of anyone trying to run it on the phone.

DifferentialityDevelopment commented 3 weeks ago

I think one of the issues of implementing this is just that you can't simply open a port on an (unrooted) android phone. Though I could be wrong here. The worker and root both need an open port so the worker can communicate with the root and vice versa, so probably you'd need the workers to initiate the connection to the root, right now it expects the workers to already be running on a particular port, then the root node connects to them. So the main difficulty is in how the worker/root communication would be accomplished, you'd probably need the root to still be a PC that isn't running android, you also mentioned you have a pool of 5000 devices, right now there is a problem where the amount of nodes cannot scale beyond the number of kv heads (correct me if i'm wrong here @b4rtaz)

Totally do-able though.

DifferentialityDevelopment commented 3 weeks ago

There are lots of ways around the communication issue, but if a port cannot be opened from the android phone side i believe the workers will need to initiate a connection with the root (something like an constant time delayed ping every second until connection is achieved, then you start up the root, wait a few seconds for worker auto discovery (workers all initiating connections with the root) then bidirectional communication at least is sorted.

On that note @b4rtaz it would be great if you didn't need to restart the workers each time the root node exits.

dcale commented 2 weeks ago

There are lots of ways around the communication issue, but if a port cannot be opened from the android phone side i believe the workers will need to initiate a connection with the root (something like an constant time delayed ping every second until connection is achieved, then you start up the root, wait a few seconds for worker auto discovery (workers all initiating connections with the root) then bidirectional communication at least is sorted.

On that note @b4rtaz it would be great if you didn't need to restart the workers each time the root node exits.

You can open ports that should not be the issue, I'll play around with this and will let you know what I found.