[feature]: Support CPU accelerate by using GGUF

SkywardAI / kimchima

Customise neural nets, knowledge distillation and transfer learning collection

https://pypi.org/project/kimchima/

Apache License 2.0

0 stars 2 forks source link

[feature]: Support CPU accelerate by using GGUF #87

Open Aisuko opened 6 months ago

Aisuko commented 6 months ago

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf?source=post_page-----7d1fa0b0b623--------------------------------

Aisuko commented 4 months ago

Hugging Face transformers already support gguf. However, only several model architectures. So, we will do some test first. If it is ok we will suppert CPU accelerate smoothly. More detail see our discussion

Aisuko commented 4 months ago

Currently, we support CPU inference accelerate using llama.cpp. However, we will keep working on kimchima repo. We need to implement the CPT and fine-tune in kimchima