bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.89k stars 490 forks source link

Direct server-to-server communication during finetuning #560

Open dvmazur opened 3 months ago

dvmazur commented 3 months ago

This PR is meant to implement direct server-to-server communication via push messages, similar to ones in rpc_inference.

Note 2 self: minimal testing scenario

Run a server

python -m petals.cli.run_server Maykeye/TinyLLama-v0 --num_blocks 8 --new_swarm --identity_path server1.id --host_maddrs /ip4/127.0.0.1/tcp/1337

And then open ./examples/workbench_call_rpc_directly.ipynb

and replace INITIAL_PEERS and peer_id_string