dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
https://dusty-nv.github.io/NanoLLM/
MIT License
132 stars 16 forks source link

How to support other models? #23

Closed PredyDaddy closed 1 week ago

PredyDaddy commented 2 weeks ago

Hello, I have demand that deploy some vlm nano llm not supported to Orin,such as qwenvl, Can I know hwo to join other vlm to nanollm.

Many many thanks!

dusty-nv commented 2 weeks ago

Hi @PredyDaddy, supporting quantized VLMs first requires you to have the LLM working (qwen) along with any vision encoders / projectors it uses. The VLM pipeline then gets configured in NanoLLM.config_vision() and NanoLLM.init_vision()

Most of the ones so far have been Llava-based, so they follow the same CLIP/SigLIP -> mm_projector -> llama flow, however I am currently adding support for OpenVLA which is different with the vision encoders. Hopefully I will be able to check that code in soon, and it will serve as a better example of how alternate VLMs are supported alongside Llava-esque ones.