Open quantumalchemy opened 1 month ago
Hi @quantumalchemy , Which llamafile did you use? And how did you run it?
I just tried a llamafile which was created with version 0.8.13
and it returns the model description on /v1/models
:
Steps to reproduce:
https://huggingface.co/wirthual/Meta-Llama-3.2-1B-Instruct-llamafile/resolve/main/Meta-Llama-3.2-1B-Instruct-Q3_K_L.llamafile
chmod +x Meta-Llama-3.2-1B-Instruct-Q3_K_L.llamafile
./Meta-Llama-3.2-1B-Instruct-Q3_K_L.llamafile
Then you can visit http://localhost:8080/v1/models
and you should see the model definition.
Or over the terminal:
curl -H 'Content-Type: application/json' -X GET localhost:8080/v1/models
Which results in:
{
"data": [
{
"created": 1729127562,
"id": "Llama-3.2-1B-Instruct-Q3_K_L.gguf",
"object": "model",
"owned_by": "llamacpp"
}
],
"object": "list"
}
Which follows the openai api spec
To check your llamafile version you can do:
./Meta-Llama-3.2-1B-Instruct-Q3_K_L.llamafile --version
llamafile v0.8.13
Prerequisites
Feature Description
Feature Request: /v1/models endpoint for further openai api compatibility many apps projects request a list of models using a openai proxy like lamafile / llama.cpp but will fail when lamafile / llama.cpp is used
Motivation
many apps projects request a list of models using a openai proxy like lamafile / llama.cpp
Possible Implementation
add a mock /v1/models or the model (-m) endpoint to emulate a valid openai response