Open KamilLegault opened 8 months ago
Hi @KamilLegault,
you can have a look here: https://lmql.ai/docs/models/llama.cpp.html#model-server.
You can start a LMTP inference endpoint by running
lmql serve-model llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf
In the playground, you then need to specify which model to use, e.g.:
argmax
"What is the capital of France? [RESPONSE]"
from
lmql.model("llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf")
where
len(TOKENS(RESPONSE)) < 20
Without local:
in front of llama.cpp:
, the playground will look for that exact model running within the inference endpoint, as stated in the documentation.
Hope that helps :)
Best Leon
In addition to what @reuank said, you can also specify the default model for the playground on launch.
For instance:
LMQL_DEFAULT_MODEL='local:gpt2' lmql playground
Like, queries without from
clause will also use local:gpt2
by default.
Is there any documentation on how to have the playground connect to a locally hosted model (llama.cpp) ? I have not been able to figure out how to do it.