Open Aisuko opened 6 months ago
Hugging Face transformers already support gguf. However, only several model architectures. So, we will do some test first. If it is ok we will suppert CPU accelerate smoothly. More detail see our discussion
Currently, we support CPU inference accelerate using llama.cpp. However, we will keep working on kimchima repo. We need to implement the CPT and fine-tune in kimchima
https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf?source=post_page-----7d1fa0b0b623--------------------------------