DeabLabs / cannoli

Cannoli allows you to build and run no-code LLM scripts using the Obsidian Canvas editor.
MIT License
314 stars 23 forks source link

Image / vision API support #16

Closed dev-msp closed 1 month ago

dev-msp commented 10 months ago

Just discovered this, and I both detest and greatly respect that you actually built (and surpassed) something I merely daydreamed about idly one day. Congrats! :D

As of recently, images serve as both LLM input (Vision API beta) and output (DALL-E). Now I can't help but wonder about the potential for image nodes in Cannoli!

I haven't familiarized myself with the code yet, but in theory I'm very much down to contribute as well.

teleologia commented 10 months ago

I've been testing vision with the action nodes. got it working like this:

TEST vision cno

blindmansion commented 10 months ago

Thanks so much for trying cannoli!

Yes this would be awesome, and with all the new API features out it's high on my list.

Also, awesome work solarbotanist, are you saying that this is working as intended, and the images are being sent and received correctly? That's amazing! I always wanted to write an openai action node for the meme, but hadn't yet.

teleologia commented 10 months ago

Yes everything is working as intended! The action node as described can proces both online image URI's as well as base64 encoded images.

A base64 encoding function could be implemented in the TS so that images in the vault could be used directly.

lightningRalf commented 4 months ago

any updates on this?

blindmansion commented 4 months ago

Not yet, but I just looked at the vision docs for OpenAI at least, and it's not as bad as I thought it would be. So maybe not that far off after all.

blindmansion commented 1 month ago

Vision is now implemented! All you have to do is embed an image from a url or your vault using markdown syntax: ![[your_image.jpg]] or ![description](imageurl.com)

We've also added a built-in dalle action.