gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Apache License 2.0
402 stars 35 forks source link

EvoVLM-JP-v1-7B Support #58

Closed saftle closed 6 months ago

saftle commented 7 months ago

Currently one of the best VLMs is EvoVLM-JP-v1-7B that just released this week. You can grab it here: https://huggingface.co/SakanaAI/EvoVLM-JP-v1-7B. It has code on the huggingface page to get it easily running locally as well.

I'd love to see this somehow working in comfyui, considering it scores even better than LLava 1.6. You can see the scores here:

evovlm-results

(Source: https://sakana.ai/evolutionary-model-merge/)

It also works perfectly in English, however sometimes you just need to remind it to only respond in English. Otherwise the VLM is quite accurate at describing images.

Thanks for the consideration!

gokayfem commented 7 months ago

we should request gguf version of this. maybe they can provide that. even with these scores without gguf or gptq version, i dont think many people use this. what do you think? i will consider adding this.

saftle commented 7 months ago

@gokayfem I believe this one is a GPTQ version: https://huggingface.co/camenduru/EvoVLM-JP-v1-7B-4bit

gokayfem commented 7 months ago

i will try to make it work today

saftle commented 6 months ago

@gokayfem Thanks for the help, but it looks like it already works with this GGUF version. https://huggingface.co/mmnga/SakanaAI-EvoLLM-JP-v1-7B-gguf/tree/main