[bug] opentelemetry incompatibilities with LLM semantics

codefromthecrypt commented 1 month ago

Describe the bug As a first timer, I tried the openai instrumentation, and sent a trace to a local collector (using ollama as the backend). Then I compared the output with llm semantics defined by otel. I noticed some incompatibilities and some attributes not yet defined.

compatible: none

incompatible:

kind=internal (should be client)
name=ChatCompletion (should be 'chat codegemma:2b-code')
attributes['llm.input_messages.0.message.content']='<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>' (should be the event attribute gen_ai.prompt)
attributes['llm.output_messages.0.message.content']='print("hello world")' (should be the event attribute gen_ai.completion)
attributes['llm.token_count.prompt']=24 (should be 'gen_ai.usage.input_tokens')
attributes['llm.token_count.completion']=11 (should be 'gen_ai.usage.output_tokens')
attributes['llm.model_name']='codegemma:2b-code' (should be the the span attribute 'gen_ai.request.model')

not yet defined in the standard:

attributes['openinference.span.kind']='LLM'
attributes['input.value']: '{"messages": [{"role": "user", "content": "<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>"}], "model": "codegemma:2b-code"}'
attributes['input.mime_type']='application/json'
attributes['output.value']: '{"id":"chatcmpl-646","choices":[{"finish_reason":"stop","index":0,"message":{"content":"print(\"hello world\")","role":"assistant"}}],"created":1721290309,"model":"codegemma:2b-code","object":"chat.completion","system_fingerprint":"fp_ollama","usage":{"completion_tokens":11,"prompt_tokens":24,"total_tokens":35}}'
attributes['output.mime_type']='application/json'
attributes['llm.invocation_parameters']='{"model": "codegemma:2b-code"}
attributes['llm.input_messages.0.message.role']='user'
attributes['llm.output_messages.0.message.role']='assistant'
attributes['llm.token_count.total']=35

To Reproduce You can use a program like this:

import os
from openai import OpenAI
from openinference.instrumentation.openai import OpenAIInstrumentor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

def initialize_tracer():
    # Set the service name such that it is different from other experiments
    resource = Resource(attributes={'service.name': 'openinference-python-ollama'})
    trace.set_tracer_provider(TracerProvider(resource=resource))
    # Default the standard ENV variable to localhost
    otlp_endpoint = os.getenv('OTEL_EXPORTER_OTLP_TRACES_ENDPOINT', 'http://localhost:4318/v1/traces')
    otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint)
    # Don't batch spans, as this is a demo
    trace.get_tracer_provider().add_span_processor(SimpleSpanProcessor(otlp_exporter))

def instrument_openai():
    OpenAIInstrumentor().instrument()

def chat_with_ollama():
    ollama_host = os.getenv('OLLAMA_HOST', 'localhost')
    # Use the OpenAI endpoint, not the Ollama API.
    base_url = 'http://' + ollama_host + ':11434/v1'
    client = OpenAI(base_url=base_url, api_key='unused')
    messages = [
      {
        'role': 'user',
        'content': '<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>',
      },
    ]
    chat_completion = client.chat.completions.create(model='codegemma:2b-code', messages=messages)
    print(chat_completion.choices[0].message.content)

def main():
    initialize_tracer()
    instrument_openai()
    chat_with_ollama()

if __name__ == '__main__':
    main()

Expected behavior I would expect semantics to extend, not clash with otel LLM ones. Ack this is a moving target as the otel ones change frequently.

Screenshots

Example collector log

otel-collector      | 2024-07-18T08:11:49.239Z  info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}
otel-collector      | 2024-07-18T08:11:49.239Z  info    ResourceSpans #0
otel-collector      | Resource SchemaURL: 
otel-collector      | Resource attributes:
otel-collector      |      -> service.name: Str(openinference-python-ollama)
otel-collector      | ScopeSpans #0
otel-collector      | ScopeSpans SchemaURL: 
otel-collector      | InstrumentationScope openinference.instrumentation.openai 0.1.8
otel-collector      | Span #0
otel-collector      |     Trace ID       : 88ab89df040772d339a2373c137b745c
otel-collector      |     Parent ID      : 
otel-collector      |     ID             : 2f2fbc89545a9444
otel-collector      |     Name           : ChatCompletion
otel-collector      |     Kind           : Internal
otel-collector      |     Start time     : 2024-07-18 08:11:48.540101 +0000 UTC
otel-collector      |     End time       : 2024-07-18 08:11:49.256843 +0000 UTC
otel-collector      |     Status code    : Ok
otel-collector      |     Status message : 
otel-collector      | Attributes:
otel-collector      |      -> openinference.span.kind: Str(LLM)
otel-collector      |      -> input.value: Str({"messages": [{"role": "user", "content": "<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>"}], "model": "codegemma:2b-code"})
otel-collector      |      -> input.mime_type: Str(application/json)
otel-collector      |      -> output.value: Str({"id":"chatcmpl-646","choices":[{"finish_reason":"stop","index":0,"message":{"content":"print(\"hello world\")","role":"assistant"}}],"created":1721290309,"model":"codegemma:2b-code","object":"chat.completion","system_fingerprint":"fp_ollama","usage":{"completion_tokens":11,"prompt_tokens":24,"total_tokens":35}})
otel-collector      |      -> output.mime_type: Str(application/json)
otel-collector      |      -> llm.invocation_parameters: Str({"model": "codegemma:2b-code"})
otel-collector      |      -> llm.input_messages.0.message.role: Str(user)
otel-collector      |      -> llm.input_messages.0.message.content: Str(<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>)
otel-collector      |      -> llm.model_name: Str(codegemma:2b-code)
otel-collector      |      -> llm.token_count.total: Int(35)
otel-collector      |      -> llm.token_count.prompt: Int(24)
otel-collector      |      -> llm.token_count.completion: Int(11)
otel-collector      |      -> llm.output_messages.0.message.role: Str(assistant)
otel-collector      |      -> llm.output_messages.0.message.content: Str(print("hello world"))
otel-collector      |   {"kind": "exporter", "data_type": "traces", "name": "debug"}

Desktop (please complete the following information):

OS: [e.g. iOS]
Version [e.g. 22]

Additional context

The otel semantics are defined by "Semantic Conventions: LLM" and folks doing the changes there are frequently on the slack channel, in case you have any questions. I personally haven't yet made any changes to the LLM semantics.

https://github.com/open-telemetry/community?tab=readme-ov-file#specification-sigs

mikeldking commented 1 month ago

Hey @codefromthecrypt, thanks for this and good callout. This project actually pre-dates the gen_ai experimental conventions and as such you are right, they do not align. We are planning on getting involved with the working groups that are making progress and reconciling the differences with our conventions. Thanks for the very helpful links - appreciate it.

codefromthecrypt commented 1 month ago

Thanks @mikeldking for the feedback and hope to see you around the #otel-llm-semconv-wg slack or any SIG meetings.

Arize-ai / openinference

[bug] opentelemetry incompatibilities with LLM semantics #604