Open nrs-status opened 7 months ago
Hi! If you're still interested, or if someone else finds this:
a minimal shard is a single transformer block. The load-balancing algorithm can be found here https://github.com/bigscience-workshop/petals/blob/main/src/petals/server/block_selection.py
Hi, I was wondering if anyone had recommendations of which parts of the library I should look at if I'm interested in how models are sharded, and if there's a way to manage this sharding manually? Most of all, if there's a way to load a shard from an already sharded model, for instance in a private cluster, so that there's a way to know in advance which instance will have which shard. Any reading suggestions in the repo are appreciated!