Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
26.52k stars 2.65k forks source link

Support Image Uploading #121

Closed timothycarambat closed 1 year ago

timothycarambat commented 1 year ago

The document processor should support the uploading and embedding of images like PNG, JPEG, and other static formats.

Ideally, this should describe the image and return that text for embedding instead of trying to do a multi-modal embedding which will be impossible to search textually over.

AntonioCiolino commented 1 year ago

Are you thinking that it should use something like a BERT or Deepdanbooru to extract info from?

timothycarambat commented 1 year ago

Are you thinking that it should use something like a BERT or Deepdanbooru to extract info from?

Both of these would be an issue to run locally since they require some big resources. Deepdanbooru is also specific to anime-girls image tagging and tends to give more NSFW results so honestly easiest implementation is just using something simple like OpenAIs CLIP which can run on replicate pretty easily (but will still cost money)

https://replicate.com/rmokady/clip_prefix_caption

AntonioCiolino commented 1 year ago

If you are calling out to external resources, there’s lots of choices of course.

phicha20224 commented 5 months ago

it return anythingllm File extension .jpg not supported for parsing and cannot be assumed as text file type.

timothycarambat commented 5 months ago

@phicha20224 - that is because we dont support uploading images right now

timothycarambat commented 5 months ago

@phicha20224 - that is because we dont support uploading images right now

xyz-rainbow commented 4 months ago

what i need to do for the support on images?