Force use_cache=True in config only

bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

https://petals.dev

MIT License

8.89k stars 489 forks source link

Closed borzunov closed 10 months ago

borzunov commented 10 months ago

This reverts a part of #496 and instead overrides use_cache in LlamaConfigs only (so the correct value is visible by HF .generate() as well).