Closed danemadsen closed 3 months ago
Hi danemadsen
what gemma 2 2b version are you using ? My model that i find from huggingface responds very slowly. Could you please share your model link?
Discussed in #594
Originally posted by mlsterpr0 August 1, 2024 On android, n_ctx and n_predict seem to always be at 512, according to app settings console, although I'm setting it higher. Is it a bug or just my budget phone? Gemma 2 2b works great, just stops generating after context length reaching 512 (i guess?). My phone has 6gb ram.
Same thing Model: https://huggingface.co/bartowski/gemma-2-2b-it-abliterated-GGUF/resolve/main/gemma-2-2b-it-abliterated-Q5_K_M.gguf?download=true
It stops in the middle of the sentence when reaches certain context limit. And every time I ask anything else in the same chat, it responds with just one symbol
Android 8gb RAM Mediatek
Upd: It's not about the model. https://huggingface.co/bartowski/Hercules-5.0-Qwen2-1.5B-GGUF/resolve/main/Hercules-5.0-Qwen2-1.5B-Q4_K_M.gguf?download=true This one also gets stuck after several messages
Discussed in https://github.com/Mobile-Artificial-Intelligence/maid/discussions/594