SkywardAI / voyager

The project is OpenAI-like API service of SkywardAI ecosystem
Apache License 2.0
3 stars 13 forks source link

[Feature]: Inference and embedding supports CPU/GPU off load #41

Open Aisuko opened 3 months ago

Aisuko commented 3 months ago

Contact Details(optional)

No response

What feature are you requesting?

We already support GPU inference and embedding at Kirin project. So, we should also support GPU in this project. Furthermore, please keep in mind what I mentioned in last meeting. We want CPU/GPU offload not the CPU or GPU separately mode.

https://medium.com/@aisuko/quantization-tech-of-llms-gguf-0342a08f082c