gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Apache License 2.0
422 stars 37 forks source link

requesting deepseek-vl and qwen-vl nodes #89

Closed synryn closed 1 month ago

synryn commented 6 months ago

https://huggingface.co/Qwen/Qwen-VL-Chat/tree/main

https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat

I've gotten extremely good results off of these, would be great to have them baseline included as a node in ComfyUI

gokayfem commented 6 months ago

im planning to add mantis, llava lama3, llava phi3, idefics 8b, deepseekvl, qwen vl, internlm 4khd, nanollava, paligemma, cogvlm2, comu all together, currently working on it.

synryn commented 6 months ago

im planning to add mantis, llava lama3, llava phi3, idefics 8b, deepseekvl, qwen vl, internlm 4khd, nanollava, paligemma, cogvlm2, comu all together, currently working on it.

Wow, amazing. Thanks for the hard work 🫡

gokayfem commented 6 months ago

https://huggingface.co/gokaygokay

synryn commented 6 months ago

New phi-3 vision just dropped too, though I haven't tested it yet https://huggingface.co/microsoft/Phi-3-vision-128k-instruct

gokayfem commented 6 months ago

yes, im aware of it and tested it, i dont know if it was bad luck but it failed my 2 pictures those include text inside it. thanks for the suggestion, benchmarks look extremely good, i dont know if it applies to real world yet.