EricLBuehler / candle-vllm

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
MIT License
225 stars 23 forks source link

Using candle-vllm as crate in rust? #62

Open gkvoelkl opened 1 month ago

gkvoelkl commented 1 month ago

Hi Eric, great rust programm.

I am looking for a crate so I can use a chatbot function within my rust programm. I tried to to that with candle. I hope it will be more documented in den future.

Will it be possible to call a function of candle-vllm without starting an explicit server? So I can use candle-vllm within my programm.

Thanks

Best regards Gerhard

EricLBuehler commented 1 month ago

Hi @gkvoelkl! Candle-vllm is a great option: you can see an example of how to build such a chatbot here in pure Rust: openai_server.rs.

I would also recommend that you check out mistral.rs as it not only has PagedAttention but Metal, Cpu, vision model, adapter models, quantization, and a plethora of other features including a crate which is meant for usage in an application (docs, examples).