Closed PredyDaddy closed 4 months ago
Hi @PredyDaddy, supporting quantized VLMs first requires you to have the LLM working (qwen) along with any vision encoders / projectors it uses. The VLM pipeline then gets configured in NanoLLM.config_vision()
and NanoLLM.init_vision()
Most of the ones so far have been Llava-based, so they follow the same CLIP/SigLIP -> mm_projector -> llama flow, however I am currently adding support for OpenVLA which is different with the vision encoders. Hopefully I will be able to check that code in soon, and it will serve as a better example of how alternate VLMs are supported alongside Llava-esque ones.
Hello dusty, I am currently researching Visual Language Models (VLM) for scene analysis and grounding tasks using prompts. I need a model that supports Chinese and has strong grounding capabilities. The Qwen-VL model seems to be a good fit. Can I continue to ask questions here about how to infer Qwen-VL with NanoLLM?"
thanks!
@PredyDaddy if you are desiring optimized quantization for Qwen-VL, check the support in MLC if it is able to load the base LLM. Then check what it's vision encoder(s) run it. To quantize it, basically you decompose the VLM back into its vision encoders + LLM, then run the LLM weights through the quant tools.
If you just want to run Qwen-VL, then hopefully you can just install that repo on top of PyTorch/Transformers/ect from jetson-containers and try it that way first. That is how I always begin with the upstream models.
Hello, I have demand that deploy some vlm nano llm not supported to Orin,such as qwenvl, Can I know hwo to join other vlm to nanollm.
Many many thanks!