bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
9.26k stars 526 forks source link

Manual management of shards #573

Open nrs-status opened 7 months ago

nrs-status commented 7 months ago

Hi, I was wondering if anyone had recommendations of which parts of the library I should look at if I'm interested in how models are sharded, and if there's a way to manage this sharding manually? Most of all, if there's a way to load a shard from an already sharded model, for instance in a private cluster, so that there's a way to know in advance which instance will have which shard. Any reading suggestions in the repo are appreciated!

justheuristic commented 4 months ago

Hi! If you're still interested, or if someone else finds this:

a minimal shard is a single transformer block. The load-balancing algorithm can be found here https://github.com/bigscience-workshop/petals/blob/main/src/petals/server/block_selection.py