LLM Observability should propagate custom tags and metadata

JensRoland commented 3 months ago

Summary of problem

Logging LLM traces to DataDog is useful, but would be much better with custom metadata / tags. My organisation wants to analyse logs separately for different teams and applications, so the ability to tag the data in DataDog is fairly crucial.

We proxy all LLM requests through a single central service based on LangChain. This service gets the team and application names from custom HTTP headers. As such, we want to dynamically pass these values as metadata when we invoke the LLMs.

However, when we use the automatic integration for LangChain, we don't have access to any of the annotations, and if we try to manually do it, the logs lose the input and output fields? Perhaps there is a way to do this, but we have not had any luck with LLMObs.annotate or workflow. And looking at the dd-trace code, it seems the integration only supports the two metadata fields temperature and max_tokens.

When invoking an LLM with LangChain, you can actually pass metadata, either through passing RunnableConfig to invoke or passed directly to BaseChatModel constructors like ChatOpenAI. I would expect this metadata to be passed automatically to DataDog (this could be configurable).

Or, alternatively, it would be great with a context manager for setting default ml_app, metadata, and tags for all invocations (automatic or otherwise) in the scope of the context manager, a la:

with LLMObs.context({
        "ml_app": app_name,
        "metadata": {
            "team": team_name,
            "cost_center": ccid,
            "customer": tenant_org_id,
        },
        "tags": []
    }):
        llm.invoke(messages)

Which version of dd-trace-py are you using?

2.11.1

Which version of pip are you using?

24.0

Which libraries and their versions are you using?

`pip freeze`

aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 astroid==3.2.4 attrs==24.2.0 black==24.8.0 boto3==1.34.162 boto3-stubs==1.35.3 botocore==1.34.162 botocore-stubs==1.35.3 bytecode==0.15.1 cattrs==23.2.3 certifi==2024.7.4 cfgv==3.4.0 charset-normalizer==3.3.2 click==8.1.7 coverage==7.6.1 dataclasses-json==0.6.7 ddsketch==3.0.1 ddtrace==2.11.1 Deprecated==1.2.14 dill==0.3.8 distlib==0.3.8 distro==1.9.0 dnspython==2.6.1 dodgy==0.2.1 email_validator==2.2.0 envier==0.5.2 fastapi==0.111.1 fastapi-cli==0.0.5 filelock==3.15.4 flake8==5.0.4 flake8-polyfill==1.0.2 frozenlist==1.4.1 gitdb==4.0.11 GitPython==3.1.43 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 identify==2.6.0 idna==3.7 importlib_metadata==8.0.0 iniconfig==2.0.0 isort==5.13.2 Jinja2==3.1.4 jiter==0.5.0 jmespath==1.0.1 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.14 langchain-aws==0.1.17 langchain-community==0.2.6 langchain-core==0.2.34 langchain-openai==0.1.22 langchain-text-splitters==0.2.2 langsmith==0.1.101 lazy-object-proxy==1.10.0 markdown-it-py==3.0.0 MarkupSafe==2.1.5 marshmallow==3.21.3 mccabe==0.7.0 mdurl==0.1.2 multidict==6.0.5 mypy==1.11.1 mypy-boto3==1.35.3 mypy-boto3-bedrock-runtime==1.35.0 mypy-extensions==1.0.0 nodeenv==1.9.1 numpy==1.26.4 openai==1.42.0 opentelemetry-api==1.26.0 orjson==3.10.7 packaging==24.1 pathspec==0.12.1 pep8-naming==0.10.0 platformdirs==4.2.2 pluggy==1.5.0 pre-commit==3.8.0 prospector==1.10.3 protobuf==5.27.3 pycodestyle==2.9.1 pydantic==2.8.2 pydantic-settings==2.4.0 pydantic_core==2.20.1 pydocstyle==6.3.0 pyflakes==2.5.0 Pygments==2.18.0 PyJWT==2.9.0 pylint==3.2.6 pylint-celery==0.3 pylint-django==2.5.3 pylint-flask==0.6 pylint-plugin-utils==0.7 pytest==8.3.2 pytest-cov==5.0.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-json-logger==2.0.7 python-multipart==0.0.9 PyYAML==6.0.2 regex==2024.7.24 requests==2.32.3 requirements-detector==1.2.2 rich==13.7.1 ruff==0.4.10 s3transfer==0.10.2 semver==3.0.2 setoptconf-tmp==0.3.1 setuptools==70.3.0 shellingham==1.5.4 six==1.16.0 smart-open==7.0.4 smmap==5.0.1 sniffio==1.3.1 snowballstemmer==2.2.0 SQLAlchemy==2.0.32 starlette==0.37.2 tenacity==8.5.0 tiktoken==0.7.0 toml==0.10.2 tomlkit==0.13.2 tqdm==4.66.5 typer==0.12.4 types-awscrt==0.21.2 types-PyYAML==6.0.12.20240808 types-s3transfer==0.10.1 typing-inspect==0.9.0 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.2 uvicorn==0.30.6 uvloop==0.20.0 virtualenv==20.26.3 watchfiles==0.23.0 websockets==13.0 wrapt==1.16.0 xmltodict==0.13.0 yarl==1.9.4 zipp==3.20.0

How can we reproduce your problem?

Import ddtrace, langchain, and langchain-openai
Enable with LLMObs.enable(ml_app="llm-service") (we use an agent, but I don't think that makes a difference)

Instantiate ChatOpenAI and run invoke like this:

        llm_completion: BaseMessage = llm.invoke(
        messages,
        RunnableConfig(
            tags=[f"team:{team}", f"app:{app}"],
            metadata={"team": team, "app": app},
        ),
    )

Use ddtrace-run

What is the result that you get?

DataDog trace metadata shows only temperature and max_tokens

What is the result that you ~expected~ hoped for?

DataDog trace metadata showing temperature and max_tokens, and also team and app

Yun-Kim commented 3 months ago

Hi @JensRoland, thanks for reaching out! We are currently looking into allowing users to annotate or customize integration-generated spans via context managers. In the meantime, we'll also add functionality to capture the metadata argument in LLM/chat model invocations.

lievan commented 2 months ago

Hi @JensRoland, the ability to use a context manager to tag auto-instrumented LLM Observability spans has been introduced with this PR

This feature should be released within the next few weeks with ddtrace 2.14.0

We're working on adding support for annotating metadata and ml_app for auto-instrumented spans as well, thanks for noting these use cases

Yun-Kim commented 2 months ago

Going to close this for now, please give this a try once ddtrace==2.14.0 is released and feel free to reopen this ticket (or a new one) if you still have trouble with this use case!

DataDog / dd-trace-py