Arize-ai / openinference

OpenTelemetry Instrumentation for AI Observability
https://arize-ai.github.io/openinference/
Apache License 2.0
201 stars 34 forks source link

[ai-sdk] [docs] map out `ai` semantic conventions and how it maps to openinference #836

Closed mikeldking closed 1 month ago

Parker-Stafford commented 1 month ago

Vercel AI SDK Semantic Conventions to OpenInference Semantic Conventions

The following is a map from the current Vercel AI SDK semantic OpenTelemetry semantic conventions to the OpenInference semantic conventions Vercel AI SDK Semantic Conventions

OpenInference Semantic Conventions

Vercel captures some basic info for LLM, embedding, and tool calls Additionally, they capture more specific data depending on which AI SDK function (e.g., generateText) you are calling.

Vercel Attributes OpenInference Attributes Additional Info
-------
LLM Spans
ai.model.id llm.model_name
ai.model.provider N/A
ai.request.headers.* N/A Can use OTEL semconv (http.request.header.\<key\>)
ai.settings.\<key\> llm.invocation_parameters.\<key\>
ai.settings.maxRetries llm.invocation_parameters
ai.telemetry.functionId name This is not an LLM function this is the id of the function called in application code
ai.telemetry.metadata.* metadata
ai.usage.completionTokens llm.token_count.completion
ai.usage.promptTokens llm.token_count.prompt
resource.name name this is set in the same way as ai.telmetry.functionId (see above)
ai.prompt input.value input.mime_type=text/plain
ai.result.text output.value
ai.result.toolCalls llm.output_messages.message.tool_calls This is on result for vercel, we could do input, output, or top level tool calls
ai.finishReason N/A
ai.settings.maxToolRoundtrips llm.invocation_parameters This could go in invocation parameters
operation.name name / openinference.span.kind this is a combination of the SDK function called (e.g., ai.generateText) and the functionId (see above)
since this fvalue is formatted like ai.\<sdk-function-name>.\<function-id> we can use this to determine the span kind as well
ai.prompt.format N/A The format of the prompt. Not sure exactly what this means maybe how template variables are specified. This comes from ai.generateText.doGenerate TODO look into what this value could be.
ai.prompt.messages llm.input_messages
ai.schema N/A This is captured on the generateObject function
Description: Stringified JSON schema version of the schema that was passed into the generateObject function
ai.schema.name N/A (see above)
ai.schema.description N/A (see above)
ai.result.object output.value use with output.mime_type=application/json
ai.settings.mode N/A This is captured on the generateObject function
Description: the object generation mode
-------
Embedding Spans
ai.model.id embedding.model_name
ai.model.provider N/A
ai.request.headers.* N/A Can use OTEL semconv (http.request.header.\<key\>)
ai.settings.maxRetries N/A
ai.telemetry.functionId name This is not an LLM function this is the id of the function called in application code ai.telemetry.metadata.* metadata
ai.usage.tokens N/A For embeddings we don't currently track tokens but do capture the vector
Vercel Explanation of this attribute: the number of tokens that were used
resource.name name this is set in the same way as ai.telmetry.functionId (see above)
ai.value embedding.text
ai.embedding embedding.vector
ai.values embedding.embeddings.{i}.embedding.text
ai.embeddings embedding.embeddings.{i}.embedding.vector
-------
Tool call spans
ai.toolCall.name tool.name
ai.toolCall.id N/A OpenInference does not currently capture this
ai.toolCall.args tool.parameters Although named args from vercel their description is:
the parameters of the tool call
ai.toolCall.output output.value output.mime_type="application/json"
Only available from vercel if "tool call is successful and the result is serializable"

Standardized gen_ai Attributes

Vercel also specifies some "standardized gen_ai attributes". Presumably those are referring to these https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/, however the links in their docs (see link above) are broken and there is a slight mismatch in the conventions. That being said, below is the mapping of the gen_ai attributes as outlined currently in the Vercel documentation (which is not exhaustive when compared to the link above) to OpenInference semantic conventions.

gen_ai Attributes OpenInference Attributes Additional Info
gen_ai.request.model llm.model_name
gen_ai.response.finish_reasons N/A
gen_ai.system N/A Otherwise known as the provider (e.g., openai)
gen_ai.usage.completion_tokens llm.token_count.completion This does not match the conventions in the spec linked above, there this is called gen_ai.usage.completion_tokens
gen_ai.usage.prompt_tokens llm.token_count.prompt This does not match the conventions in the spec linked above, there this is called gen_ai.usage.input_tokens
Parker-Stafford commented 1 month ago

closing as complete see note above / package and mapping here https://github.com/Arize-ai/openinference/tree/main/js/packages/openinference-vercel