amaiya / onprem

A tool for running on-premises large language models with non-public data
https://amaiya.github.io/onprem
Apache License 2.0
684 stars 32 forks source link

offload_kqv not properly set #50

Closed amaiya closed 8 months ago

amaiya commented 8 months ago

Latest versions of llama-cpp-python seem slower due to the fact that offload_kqv is not properly set.

Ensure offload_kqv is set to True for fast GPU-accelerated inference.

amaiya commented 7 months ago

As a result of auto-supplying offload_kqv=True, LangChain issues the following warning:

 WARNING! offload_kqv is not default parameter.
                offload_kqv was transferred to model_kwargs.
                Please confirm that offload_kqv is what you intended.

The warning can be ignored.