Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
16.75k stars 830 forks source link

Add Phi-3-mini-128k-instruct to README.md #436

Closed AlessandroW closed 1 month ago

AlessandroW commented 1 month ago

Thanks to the recent llama.cpp update I was able to convert the Phi3 Mini 128k model to gguf/llamafile as f16 and Q4_K_M version. Due to the large context size it might be useful to others.

Cheers, Alessandro

jart commented 1 month ago

Thank you for sending this. I'm really happy to see you distributing llamafiles on Hugging Face. Once llama.cpp fixes ggml-cuda and I sync it here, I'm planning to upload llamafiles for phi-3-medium-128k-instruct to Hugging Face under Mozilla's account. The -c 0 128k context size currently needs ~40gb of memory. So I think it makes sense to use the largest model Microsoft provides, since the weights themselves are kind of small by comparison.