Closed flozi00 closed 1 year ago
@flozi00 nice suggestion. I also wanted to suggest the same. It is nice to support image documents which will suite VQA, Image search and other use cases.
Fews concerns I have are -
Overall it is nice to have it in haystack in my view but adding it will require good design discussion and proper long term planning. Frequent breaking changes will not be good. Also I see deepset already have handful and they would need active support from the community.
I see lot of good suggestions from the community, so how about having experimental feature stream to have a playground for these features and graduate matured features to mainline?
@Timoeller @tholor @PiffPaffM
It's pretty clear to me that we will eventually add other data types to Haystack. The vision here is really to build natural language interfaces to all kinds of data. This includes texts, images, tables, databases, logs ...
However, we want to nail the text case first and optimize it really end-to-end instead of allowing 5 formats with "50% solutions". TableQA is probably one of the bigger next additions and we are actively working on it right now. So long-story short, VQA is nothing that we will work on in the next weeks for sure, but it's on the longterm roadmap.
@lalitpagaria what do you mean with experimental stream? A separate branch here in the repo?
@tholor I am align with the vision. My only concern is prioritization. Hence suggested if we have process around it. In my view these are two most time consuming steps and of-course critical: Design Discussion and Code Review. Now able to come up with solution to resolve it.
Regarding experimental stream, I mean separate to have module experimental
and branch experimental
. Which will daily rebased with master. Any new code like VQA, CLIP which is not part of current roadmap or plan will go there. It will have nightly release. So people can contribute there which will have less stringent code review and design process. And once every month or quarter these can be bring to mainline based on user's feedback and roadmap (of course it will go through design discussion and code review). This is just my suggestion, I am open for other idea as well.
Is your feature request related to a problem? Please describe. No, it would be just cool
Describe the solution you'd like Indexing and searching for images by text
Describe alternatives you've considered Jina already does, but since CLIP is in latest huggingface release it would be cool have it here too
Additional context I did some runs locally with my own photos and the results were amazing. Describing images instead of just keywords improves the performance masively, event special query working fine
But the biggest question I have is if you want to have vision data in this framework or not ?
Can you please share reference link for the one you've tried. I'd like to see results as well.
Thanks, Rakesh.
CLIP support was implemented by @ZanSara in #2418.
I think that this issue can be closed now.
Is your feature request related to a problem? Please describe. No, it would be just cool
Describe the solution you'd like Indexing and searching for images by text
Describe alternatives you've considered Jina already does, but since CLIP is in latest huggingface release it would be cool have it here too
Additional context I did some runs locally with my own photos and the results were amazing. Describing images instead of just keywords improves the performance masively, event special query working fine
But the biggest question I have is if you want to have vision data in this framework or not ?