bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.89k stars 490 forks source link

Fix dummy cache allocation #574

Closed artek0chumak closed 2 months ago

artek0chumak commented 2 months ago

Fix allocation of the dummy key_value cache. It's not used in actual computations, but torch checkers require them to be on the correct device.