Closed sjrl closed 4 months ago
Hi, there is some demand for a DocumentClassifier
component (https://github.com/deepset-ai/haystack/issues/7669). So instead of having a TransformersTextRouter
, another approach with a slightly longer pipeline would be having a DocumentClassifier
connected to a ConditionalRouter
. Did I understand correctly?
Not quite, since the goal of the DocumentClassifier
is to add classification labels to Documents
in their metadata. And then index the Documents
with that metadata into to the DocStore to then be used later by a query pipeline (e.g. through filters
).
While technically it would be possible to use the DocumentClassifier
with the ConditionalRouter
as you outlined you would also have to convert the query into a Document
(so some DocumentBuilder
component), then pass that to the DocumentClassifier
and then the ConditionalRouter
, which is somewhat clunky.
I think it makes sense to have a component that can directly work on Text to perform this routing so we can skip the Document
intermediate. I think this is similar reasoning as to why we have separate TextEmbedder
and DocumentEmbedder
components that use the same underlying models.
What do you think?
Makes sense! Thanks @sjrl
Adding a new component TransformersTextRouter
based on the TextClassification pipeline is a good idea! 👍
@sjrl We won't have capacity to work on it in the next sprint. Would you like to add the component in a PR? Sounds like you already have a clear understanding of what the implementation should look like and you could use the TransformersZeroShotTextRouter as a blue print.
No worries, this isn't something we need urgently, but within the next quarter. I'd be happy to open PR once I have some free time to work on this :)
Is your feature request related to a problem? Please describe. Recently we half ported over this component from v1 with this PR. By this I mean previously the
TransformerQueryClassifier
worked both the the ZeroShotClassification pipeline from HF and the TextClassification pipeline from HF.However, to follow the philosophy of Haystack v2 we decided to split up this old component into two new ones, the
TransformersZeroShotTextRouter
(based on the ZeroShotClassification pipeline) and the yet to be madeTransformersTextRouter
(based on the TextClassification pipeline).Describe the solution you'd like A new Router called
TransformersTextRouter
based on the TextClassification pipeline.Additional context This would allow a query routing based on text classification which the Sol team actively uses to build pipelines in Haystack V1 for clients (e.g. prompt injection classification). Therefore, to ease the transition to v2 for clients it would be great to add this component.