Direct server-to-server communication during finetuning

bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

https://petals.dev

MIT License

8.89k stars 490 forks source link

Open dvmazur opened 3 months ago

dvmazur commented 3 months ago

This PR is meant to implement direct server-to-server communication via push messages, similar to ones in rpc_inference.

Note 2 self: minimal testing scenario

Run a server

python -m petals.cli.run_server Maykeye/TinyLLama-v0 --num_blocks 8 --new_swarm --identity_path server1.id --host_maddrs /ip4/127.0.0.1/tcp/1337

and replace INITIAL_PEERS and peer_id_string