Closed AlessandroW closed 1 month ago
Thank you for sending this. I'm really happy to see you distributing llamafiles on Hugging Face. Once llama.cpp fixes ggml-cuda and I sync it here, I'm planning to upload llamafiles for phi-3-medium-128k-instruct to Hugging Face under Mozilla's account. The -c 0
128k context size currently needs ~40gb of memory. So I think it makes sense to use the largest model Microsoft provides, since the weights themselves are kind of small by comparison.
Thanks to the recent llama.cpp update I was able to convert the Phi3 Mini 128k model to gguf/llamafile as f16 and Q4_K_M version. Due to the large context size it might be useful to others.
Cheers, Alessandro