Closed amaiya closed 8 months ago
As a result of auto-supplying offload_kqv=True
, LangChain issues the following warning:
WARNING! offload_kqv is not default parameter.
offload_kqv was transferred to model_kwargs.
Please confirm that offload_kqv is what you intended.
The warning can be ignored.
Latest versions of
llama-cpp-python
seem slower due to the fact thatoffload_kqv
is not properly set.Ensure
offload_kqv
is set to True for fast GPU-accelerated inference.