Local Tiny AI Vision Language Model (1.6B)

Nuked88 / ComfyUI-N-Nodes

A suite of custom nodes for ConfyUI that includes GPT text-prompt generation, LoadVideo, SaveVideo, LoadFramesFromFolder and FrameInterpolator

MIT License

203 stars 22 forks source link

Local Tiny AI Vision Language Model (1.6B) #30

Closed BenderBlender closed 8 months ago

BenderBlender commented 9 months ago

I wanted to use “blip analyze image” in my workflow, but after the next comfyui updates this node unfortunately stopped working. But an excellent neural network model with vision support has appeared (Local Tiny AI Vision Language Model (1.6B)).

It would be great if you could add support for this model to the nodes! https://github.com/vikhyat/moondream?tab=readme-ov-file

https://www.youtube.com/watch?v=oDGQrOlmC1s

BenderBlender commented 9 months ago

The author has already done it himself) https://github.com/shadowcz007/comfyui-moondream

Nuked88 commented 9 months ago

Damn this is interesting! Thanks for share! I didn0t know that there are such a small models able to do that (i was suck to the main LLava model that is great but also really heavy) , i will definitly check it out!

BenderBlender commented 9 months ago

I collected a work flow with your node, which draws itself and then tries to improve its creation using this pattern recognition model )))

Nuked88 commented 9 months ago

Very nice!

BenderBlender commented 9 months ago

The guys have already added a 2nd model, it seems even better than the first. Much better I would say... https://github.com/zhongpei/Comfyui_image2prompt

BenderBlender commented 9 months ago

Please advise a good model (GGUF GPT) that will work with your node. There are so many of them that my eyes are running wild. Maybe there is some kind of rating?

BenderBlender commented 8 months ago

Nuked88 commented 8 months ago

Please advise a good model (GGUF GPT) that will work with your node. There are so many of them that my eyes are running wild. Maybe there is some kind of rating?

As you said there are too many of them so i really cannot say which is better than which but yes there are ALOT of benchmark tests used for rating an llm models, you can have a summary here https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard or here: https://eqbench.com/

Nuked88 commented 8 months ago

ok i've added support for llava model, moondream and joytag. I'm trying to add interlm (i did most of the code ) but i'm having some issues, plus being a heavier model I have to give it less priority so I think I will add it but not right away