[FEATURE] Add a Node/Document Loader for Image Embedding Using Multimodal LLMs

Describe the feature you'd like: I would like to request the addition of a node or document loader in Flowise that enables the embedding of images using a multimodal large language model (LLM). This feature would allow users to process and embed images alongside textual data, expanding the capabilities of the platform for handling multimodal datasets.

Use Cases:

Enhancing search and retrieval functionality by indexing and querying images with semantic embeddings. Enabling richer applications that combine image and text data, such as multimedia question-answering or document analysis. Supporting workflows where visual information is a key component, such as analyzing diagrams, infographics, or scanned documents. Additional context: It would be helpful to have support for commonly used image embedding models like CLIP, BLIP, amazon.titan-embed-image-v1 or similar state-of-the-art multimodal LLMs. Integration with existing workflows in Flowise, such as embedding chaining or retrieval augmentation, would be a key aspect of this feature.

I’m happy to provide further details or examples if needed!

thanks.

FlowiseAI / Flowise

[FEATURE] Add a Node/Document Loader for Image Embedding Using Multimodal LLMs #3549