Feature Request: /v1/models endpoint for further openai api compatibility

Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.

Other

20.58k stars 1.04k forks source link

Prerequisites

[X] I am running the latest code. Mention the version if possible as well.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Feature Request: /v1/models endpoint for further openai api compatibility many apps projects request a list of models using a openai proxy like lamafile / llama.cpp but will fail when lamafile / llama.cpp is used

Motivation

many apps projects request a list of models using a openai proxy like lamafile / llama.cpp

Possible Implementation

add a mock /v1/models or the model (-m) endpoint to emulate a valid openai response

Hi @quantumalchemy , Which llamafile did you use? And how did you run it?

I just tried a llamafile which was created with version 0.8.13 and it returns the model description on /v1/models:

Steps to reproduce:

https://huggingface.co/wirthual/Meta-Llama-3.2-1B-Instruct-llamafile/resolve/main/Meta-Llama-3.2-1B-Instruct-Q3_K_L.llamafile
 chmod +x Meta-Llama-3.2-1B-Instruct-Q3_K_L.llamafile
./Meta-Llama-3.2-1B-Instruct-Q3_K_L.llamafile

Then you can visit http://localhost:8080/v1/models and you should see the model definition.

Or over the terminal:

curl -H 'Content-Type: application/json' -X GET  localhost:8080/v1/models

Which results in:

{
  "data": [
    {
      "created": 1729127562,
      "id": "Llama-3.2-1B-Instruct-Q3_K_L.gguf",
      "object": "model",
      "owned_by": "llamacpp"
    }
  ],
  "object": "list"
}

Which follows the openai api spec

To check your llamafile version you can do:

./Meta-Llama-3.2-1B-Instruct-Q3_K_L.llamafile --version
llamafile v0.8.13

Mozilla-Ocho / llamafile