bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.89k stars 490 forks source link

Add "Podman" usage to the documentation #546

Open metal3d opened 6 months ago

metal3d commented 6 months ago

Hello,

At first, I'm very happy that this project exists. I could try Beluga2 thanks to the community who shares, like I do, small parts of GPU. That's very impressive!

As a Linux Fedora user, I use Podman instead of Docker. That works exactly the same as Docker in terms of performances.

The methods is to follow: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

In short, for Fedora:

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install nvidia-container-toolkit
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# then edit /etc/nvidia-container-runtime/config.toml
# to replace:
# [nvidia-container-cli]
# #no-cgroups = false
# no-cgroups = true
# and
# [nvidia-container-runtime]
# #debug = "/var/log/nvidia-container-runtime.log"
# debug = "~/.local/nvidia-container-runtime.log"

Then, launching petals server is easy:

podman run -p 31330:31330 \
    --ipc host \
    --device nvidia.com/gpu=all  \
    --security-opt=label=disable \
    --volume petals-cache:/cache \
    --rm \
    learningathome/petals:main \
    python -m petals.cli.run_server --port 31330 petals-team/StableBeluga2

As you can see, the only differences are to set a security option and give the device names.

That works like a charm on my RTX 3070.

Maybe you can add it, or do you need me to create the page / part in the documentation ?