Mobile-Artificial-Intelligence / maid

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
MIT License
1.03k stars 98 forks source link

No notification that the model list has updated #538

Closed khimaros closed 1 month ago

khimaros commented 2 months ago

EDIT: the issue is that the API endpoint must be configured in the LLM Parameters and then you need to return to the main screen and wait for the /v1/models call to complete for the list of models to be populated. it may be necessary to choose a simple model --alias (such as gpt-3.5-turbo) if running the llama.cpp/examples/server

original (no longer relevant) issue contents:

operating system: android maid release: 1.2.6 (arm64)

i'm attempting to use an OpenAI compatible endpoint (llama.cpp/server running on my own machine).

however, after configuring a fake API key and the correct server endpoint (which works in BetterChatGPT), it refuses to accept my prompt with the error A model option is required for prompting

i tried this with the OpenAI option as well as Ollama.

am i holding it wrong?

khimaros commented 2 months ago

verified this is still happening with the latest CI build as well https://github.com/Mobile-Artificial-Intelligence/maid/actions/runs/8978792560/job/24659782543

khimaros commented 2 months ago

i see, the list of models doesn't show up until after the API endpoint has been configured and queries (this happens asynchronously) and then it can be selected from the dropdown in the top right. updating this issue to clarify that it's a discoverability issue.

however, it may be necessary to launch the llama.cpp/examples/server with --alias gpt-3.5-turbo or at the least, an alias which is not too long and doesn't contain unusual characters like / which is the default for llama.cpp/examples/server

danemadsen commented 2 months ago

I don't quite understand what you're asking? Are you trying to use the OpenAI api without a model selection (ie specifying the model in llama.cpp server on the host machine). That's not behavior i want to encourage over that API as it will certainly confuse people, though i can create another API option for llama.cpp server to do that exact behavior.

khimaros commented 2 months ago

@danemadsen sorry, my initial report was pretty rambling and exploratory. let me try to summarize what i've learned.

first, there are some llama.cpp specific bits of information which i think might be helpful for context:

1) llama.cpp/examples/server has a built in OpenAI compatible endpoint 2) when launching the llama.cpp server, users typically specify a model to preload on the command line (with the -m flag). the id for these models is the path to the model file provided in the flag, which can sometimes be a long, absolute, filesystem path. 3) llama.cpp server also has a --alias flag which can be used to shorten the id to something more usable (in my case, i'm using --alias gpt-3.5-turbo for consistency and compatibility with the widest variety of software.

now for the maid specific learning:

1) when using the OpenAI mode in maid, you must configure LLM Parameters to set the API endpoint and API key to point it to your llama.cpp server. 2) after changing the endpoint and API key, you need to go "back" to the main maid screen in order to trigger a model list update. 3) the model list update happens asynchronously, and it may take a bit of time before the model list is populated. 4) after the model list is populated, you must select it from the dropdown. there is no way to put in a custom model name in the LLM parameters (which is where i was looking for it).

i think there is nothing that is a blocker for my use at this point. however, it's worth considering this user experience given the asynchronous nature of the model list being populated. currently, there is no notification that makes it clear that the list has been updated, and the user just needs to keep checking. it wasn't obvious to me that the dropdown would even be the place to find the model name, so i had to stumble upon this accidentally after poking through the dart source code.