bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.89k stars 489 forks source link

Force use_cache=True in config only #497

Closed borzunov closed 10 months ago

borzunov commented 10 months ago

This reverts a part of #496 and instead overrides use_cache in LlamaConfigs only (so the correct value is visible by HF .generate() as well).