eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.52k stars 193 forks source link

Starting the playground with a self-hosted model #277

Open KamilLegault opened 8 months ago

KamilLegault commented 8 months ago

Is there any documentation on how to have the playground connect to a locally hosted model (llama.cpp) ? I have not been able to figure out how to do it.

reuank commented 8 months ago

Hi @KamilLegault,

you can have a look here: https://lmql.ai/docs/models/llama.cpp.html#model-server.

You can start a LMTP inference endpoint by running

lmql serve-model llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf

In the playground, you then need to specify which model to use, e.g.:

argmax 
    "What is the capital of France? [RESPONSE]"
from 
    lmql.model("llama.cpp:/YOUR_PATH/YOUR_MODEL.gguf")
where 
    len(TOKENS(RESPONSE)) < 20

Without local: in front of llama.cpp:, the playground will look for that exact model running within the inference endpoint, as stated in the documentation.

Hope that helps :)

Best Leon

lbeurerkellner commented 8 months ago

In addition to what @reuank said, you can also specify the default model for the playground on launch.

For instance:

LMQL_DEFAULT_MODEL='local:gpt2' lmql playground

Like, queries without from clause will also use local:gpt2 by default.