deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17k stars 1.86k forks source link

Port Haystack v1 DocumentClassifier node to Haystack v2 #7669

Closed ms130 closed 3 weeks ago

ms130 commented 5 months ago

Is your feature request related to a problem? Please describe.

I've been using the DocumentClassifier node in Haystack v1 with a zero-shot classification model to label documents with categories, which are attached to their metadata. We have recently migrated our code to Haystack v2 but have discovered that this component does not yet exist in v2, so I'm currently unable to classify documents.

Describe the solution you'd like

It would be great if someone were able to port this very useful v1 node into a v2 component please! It would also be tremendously useful to add the multi_label argument (see here) to the new component so that the model can be run assuming multiple labels can be true. The existing v1 node doesn't provide this flexibility, so I created a custom node by subclassing it and modifying it's behaviour.

Describe alternatives you've considered

I considered creating my own custom DocumentClassifier component in v2, but have not started this yet, and am unsure about how difficult it would be.

anakin87 commented 4 months ago

This is a legitimate request!

I would start with implementing a TransformersZeroShotDocumentClassifier, only focusing on zero-shot classification.

The code should not be difficult to migrate, starting from the 1.x version.

I will tag this issue as "contributions wanted" and see if any community members would like to address it.

srini047 commented 4 months ago

Hi @anakin87, I would like to work on this. If I am not wrong this ZeroShotDocument classifier must be ported here in align with Haystack 2.0 nomenclature?

anakin87 commented 4 months ago

Good to hear... Yes, I think it should be placed in classifiers.

arminnajafi commented 3 months ago

This issue does't seem to have moved forward. I like to work on it.

Thanks,

nvzard commented 2 months ago

Hey @arminnajafi, Please confirm if you are still working on this. Otherwise, I'd like to pick it up.

jpatra72 commented 1 month ago

Hi @anakin87 , I have made a PR for the zero-shot document classifier. Let me know if you find anything missing in the implementation. :)