deepset-ai / haystack-core-integrations

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards
https://haystack.deepset.ai
Apache License 2.0
118 stars 117 forks source link

Extend the list of supported (chat) generators in langfuse for custom generators #1153

Open alex-stoica opened 3 weeks ago

alex-stoica commented 3 weeks ago

Is your feature request related to a problem? Please describe. A custom component might have the correct output to allow for integrations/langfuse/src/haystack_integrations/tracing/langfuse/tracer.py to work. However, because the generators list is structured as follows:

_SUPPORTED_GENERATORS = [
    ...
]
_SUPPORTED_CHAT_GENERATORS = [
    ...
]
_ALL_SUPPORTED_GENERATORS = _SUPPORTED_GENERATORS + _SUPPORTED_CHAT_GENERATORS

it will not work with tracing unless the custom component is incorrectly placed into one of these predefined categories, which is not an ideal or fitting solution

Describe the solution you'd like A function like register_generator_for_tracing could be created, with a parameter to specify whether the generator is a chat generator or not. This function would also check whether the output format meets a certain standard before registering the generator.

vblagoje commented 4 days ago

Ok, so what you are saying here @alex-stoica is that even though we may support tracing for all, to us known, chat generators, we can never account for customized third party generators. And because of that you'd like to have a mechanism to register your custom generator, correct?

alex-stoica commented 3 days ago

The short answer is "yes", the full context is this: The main point is that I can resolve tracing issues if I’m able to create custom blocks (including custom generators) and independently log details about them. However, currently, creating custom generators isn’t solving anything due to the hardcoded list of generators that are supported for tracing.

Let's suppose I want to trace a structured output (e.g., Pydantic/dict) alongside the generation. While this isn't yet supported, a function like:

register_generator_for_tracing(
    generator_class=CustomGeneratorModel,
    additional_params_to_trace={"structured_output": "pydantic_model_response"}
)

would make it possible.

TL;DR: I see a need to log details for custom generators (1) and custom I/O from those generators (2). The current issue only covers (1), but the full scope includes (2) as well.

vblagoje commented 3 days ago

@alex-stoica you seem to have more context and ideas here than I do. Why not hash out some code, open a PR and let's build this out.