bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
9.26k stars 526 forks source link

Running on SLURM #164

Open ghost opened 1 year ago

ghost commented 1 year ago

Having to specifically hard code IP adresses makes it very hard to run petals on a SLURM cluster. There I submit batch jobs that are then run on some node of the partition I specified. I do not know the IP beforehand of the node or any nodes that I run a petals server instance on.

So one thing that would be helpful is a "self discovery" of petals server instances inside a specified network.

justheuristic commented 1 year ago

#include sorry_for_slow_response.h

Hi!

Can you please explain how you specify node's address? Unless there's some special networking wizardry on that cluster, you should be able to specify 0.0.0.0 instead of your ip address in --host_maddrs and it should work normally.

While we figure this out, here's a quick workaround that should work on most machines:

export IPV4=$(dig -4 TXT +short o-o.myaddr.l.google.com @ns1.google.com |  tr -d '"')
# or: export IPV6=$(dig -6 TXT +short o-o.myaddr.l.google.com @ns1.google.com |  tr -d '"')

# if you do not have an ipv4 / v6 address, it will be 

# test
echo "run_stuff --host_maddrs /ip4/$IPV4/tcp/1337"

@Vahe1994 is also working on an automatic relaying script to make this even easier to set up, will keep you updated in this issue.

ghost commented 1 year ago

Hello, thanks for the reply! I was wondering, what would be the advantage of running a private petals network instead of a torch distributed or huggingface accelerate run? Sorry if the question seems very basic to you.

justheuristic commented 1 year ago

Hi! If you have a swarm where all nodes have the same GPU / network specs and are 100% reliable - you should prefer torch.distributed -- or even deepspeed.inference.

If your GPUs are preemptible - e.g. sometimes other people wanna use it and you need to shut down some of the nodes, Petals can handle that, while torch.distributed would require a lot of extra effort.

mryab commented 1 year ago

One small addition to @justheuristic's response: as far as I know, neither torch.distributed nor DS-Inference provide you with a full-fledged setup for running a model inference server, only the building blocks for parallelism and various inference optimizations. That's fine if you want to implement the actual server yourself, but if you need a complete solution for exposing models to external requests, you'd be better off with something like Triton (or Petals!)