deepset-ai / haystack-core-integrations

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards
https://haystack.deepset.ai
Apache License 2.0
104 stars 95 forks source link

Feature request for the possibility of adding user_id to the trace while using Haystack<>Langfuse connector #916

Open uvdepanda opened 1 month ago

uvdepanda commented 1 month ago

Hi there,

It seems like there is not a possibility to send out user_id to the trace while using Haystack<>Langfuse connector. It would be lovely if you could add this feature on your roadmap.

Thanks.

Best regards, Yubraj

vblagoje commented 1 week ago

Let me understand you correctly by using an example of Langfuse tracing @uvdepanda

import os

os.environ["HAYSTACK_CONTENT_TRACING_ENABLED"] = "true"

from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.connectors.langfuse import LangfuseConnector

if __name__ == "__main__":

    pipe = Pipeline()
    pipe.add_component("tracer", LangfuseConnector("Chat example"))
    pipe.add_component("prompt_builder", ChatPromptBuilder())
    pipe.add_component("llm", OpenAIChatGenerator(model="gpt-3.5-turbo"))

    pipe.connect("prompt_builder.prompt", "llm.messages")

    messages = [
        ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
        ChatMessage.from_user("Tell me about {{location}}"),
    ]

    response = pipe.run(data={"prompt_builder": {"template_variables": {"location": "Berlin"}, "template": messages}})
    print(response["llm"]["replies"][0])
    print(response["tracer"]["trace_url"])

What you would like to have is a payload, say a dict of keys/values that is passed to LangfuseConnector i.e. "tracer" component for each run invocation?

So that run invocation becomes:

response = pipe.run(data={"prompt_builder": {"template_variables": {"location": "Berlin"}, "template": messages},
                              "tracer": {"id":{"user_id": "123"}}})

Is that correct?

uvdepanda commented 1 week ago

@vblagoje That is absolutely what I would like to have. And as of now, we are achieving by doing following:

Current Solution:

langfuse = Langfuse()
if context['user_id']:
    trace_url = response["tracer"]["trace_url"]
    trace_id = trace_url.split('/')[-1] 
    langfuse.trace(id=trace_id, session_id=context['user_id'])

It works but from time to time, it does following:

Problem: we end of having all of our traces being populated onto the single one (a long trail of traces) instead of being standalone.

It would be cool if you could point out the right way of doing it.

Thanks.