Mobile-Artificial-Intelligence / maid

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
MIT License
1.03k stars 98 forks source link

Documentation Improvements (getting started guide, benchmarks) #562

Closed bradflaugher closed 2 weeks ago

bradflaugher commented 1 month ago

Hi @danemadsen, thanks for your hard work on this!

I'd like to write a userguide in a PR for noobs to figure out which models to use and debug various popular FOSS .gguf models from huggingface. I'm also thinking of some kind of table with benchmarks for andriod that shows tokens per second output or something.

Can you share some links with me so I can help? I've tried tinyllama and phi-3 and mostly got it to work but if you have any resources that you think I should use to make this I'd be happy to write it up. I can go to the main llama.cpp repo or something but that seems like overkill, your thoughts are appreciated!

danemadsen commented 1 month ago

Yeah some docs would be a great addition. You can add them to the wiki or just in the /docs directory if you like. I'm thinking of making it so the user can download models from within the app at some point in the future so a list of well performing and commonly used models would be helpful for that. As for links im not really sure what links you're looking for, could you elaborate?

bradflaugher commented 1 month ago

Right now the only docs it looks like you have is the screenshots included in the README.md

If someone downloads maid, and downloads a random gguf from huggingface and tries to run it on their phone, most of the time they are going to mess something up and it's going to look like maid itself is broken, when really they are not using a supported model, or the format is incorrect, or they messed up some setting.

So do you have any of the following?

  1. Do you have any docs on models that you have tested that have worked well? I assumed PHI3 and saw some chatter here about tinyllama, your screenshot references calypso_5_0_alphav2.gguf
  2. Do you have any ideas of what I should be testing? any .gguf from the bloke that is from a model with under 8B parameters?
  3. What about prompt formats? It's not obvious to me whether PHI3 should be using Alpaca or ChatML or something else.

I want to give beginners a table of models they can start with and parameters they can use... for example I want to make something like this (all of this is dummy data for now)

Model Name Parameter Count Tokens per Second (on Pixel 8 Pro) Usage Notes Hugging Face gguf Link Prompt Format
Phi3 1.2B 5,000 Excels at creative writing and storytelling. thebloke/phi3-quantized Alpaca
TinyLlama 7B 10,000 Strong performance in question-answering and summarization tasks. thebloke/tinyllama-quantized Alpaca
NanoGPT 125M 2,500 Efficient model for text generation and completion. thebloke/nanogpt-quantized OpenAI
danemadsen commented 1 month ago

No I haven't really kept docs on any of the timings for the models Ive tested. yes I used to test with calypso but now i primarily test with phi 3.

No idea what other models you should test other than the ones listed. Yes anything under 8B is a good start. I can get up to 13B models running on my own phone so you can try that too but they will definitely be slow.

I believe PHI 3 uses its own prompt format similar to chatml. I havnt been able to get llama.cpp to work with it well at the moment hence why im testing with it.

bradflaugher commented 1 month ago

Ok noted. I'll get testing and see what I can find.

bradflaugher commented 3 weeks ago

https://huggingface.co/models?library=gguf&sort=downloads

Working through this list. sorry for delay, had a baby 3 weeks ago.

bradflaugher commented 3 weeks ago

I think these should be a good place to start

from https://play.google.com/store/apps/details?id=com.druk.lmplayground

Screenshot_20240617-102720

apieum commented 3 weeks ago

Hi, I've quantized llama-3-8B-Instruct in Q4_K_M to try your app: https://huggingface.co/squaredlogics/Llama-3-8B-Instruct-Q4_K_M.gguf

tried also capybarahermes-2.5-mistral7b from thebloch...

It works perfectly with llama.cpp on my computer but gives random answers in your app and loop indefinitly on random prompts.

Screenshot_20240618-004009.png

I've tried to add the llama3 template: <|begin_of_text|><|start_header_id|>system<|end_header_id|>{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>{{ model_answer_1 }}<|eot_id|>

It's quite weird, as you're listing models maybe you can try mine to see what I'm doing wrong and document it to prevent others doing the same.

bradflaugher commented 3 weeks ago

I got the same thing! Quadratic equation stuff has to be prompt structure related.

bradflaugher commented 2 weeks ago

Going to abandon this in favor of https://github.com/Mobile-Artificial-Intelligence/maid/issues/579

I havent been able to get many models to work out of the box.