Mobile-Artificial-Intelligence / maid

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
MIT License
1.44k stars 160 forks source link

Can't change n_ctx? #595

Closed danemadsen closed 3 months ago

danemadsen commented 3 months ago

Discussed in https://github.com/Mobile-Artificial-Intelligence/maid/discussions/594

Originally posted by **mlsterpr0** August 1, 2024 On android, n_ctx and n_predict seem to always be at 512, according to app settings console, although I'm setting it higher. Is it a bug or just my budget phone? Gemma 2 2b works great, just stops generating after context length reaching 512 (i guess?). My phone has 6gb ram.
jensen1207 commented 3 months ago

Hi danemadsen

what gemma 2 2b version are you using ? My model that i find from huggingface responds very slowly. Could you please share your model link?

Discussed in #594

Originally posted by mlsterpr0 August 1, 2024 On android, n_ctx and n_predict seem to always be at 512, according to app settings console, although I'm setting it higher. Is it a bug or just my budget phone? Gemma 2 2b works great, just stops generating after context length reaching 512 (i guess?). My phone has 6gb ram.

shiptoorion commented 3 months ago

Same thing Model: https://huggingface.co/bartowski/gemma-2-2b-it-abliterated-GGUF/resolve/main/gemma-2-2b-it-abliterated-Q5_K_M.gguf?download=true

It stops in the middle of the sentence when reaches certain context limit. And every time I ask anything else in the same chat, it responds with just one symbol

Android 8gb RAM Mediatek

shiptoorion commented 3 months ago

Upd: It's not about the model. https://huggingface.co/bartowski/Hercules-5.0-Qwen2-1.5B-GGUF/resolve/main/Hercules-5.0-Qwen2-1.5B-Q4_K_M.gguf?download=true This one also gets stuck after several messages