[Sandbox] OpenLLMetry - Githubissues

Application contact emails

nir@traceloop.com, gal@traceloop.com

Project Summary

Open-protocol extension for OpenTelemetry for observability and evaluation of GenAI applications

Project Description

OpenLLMetry started as we were looking for an easy way to send metrics and traces of executions of LLM applications. Many common patterns like RAG applications closely resemble micro-services architecture, so it seemed natural to rely on OpenTelemetry. Moreover, LLMs and GenAI systems often exist as a component within a larger system. Understanding both how these systems influence an LLM’s output, and how the LLM’s output influences the rest of a system, is essential for people building applications using this technology. OpenTelemetry provides the standard and toolset to enable this kind of understanding. However, the standard set by other LLM observability applications like LangSmith required us to send additional information (like prompts and completions) on spans, which is subject to an ongoing debate in the OpenTelemetry community. Thus, we decided to extend OpenTelemetry semantic conventions, and build a way to monitor and trace prompts, completions, calls to vector DBs, monitoring token usage and more; all while staying compliant with the OTel standard. This allowed us to be fully compatible with any platform that supports OpenTelemetry, while offering the same level of features provided by LLM-specific observability applications. See also a blog post we published about this project.

Org repo URL (provide if all repos under the org are in scope of the application)

N/A

Roadmap context

No response

Contributing Guide

https://www.traceloop.com/docs/openllmetry/contributing/overview

Code of Conduct (CoC)

https://github.com/traceloop/openllmetry/blob/main/CODE_OF_CONDUCT.md

Adopters

No response

Contributing or Sponsoring Org

https://www.traceloop.com/

Maintainers file

https://github.com/traceloop/openllmetry/blob/main/MAINTAINERS.md

IP Policy

[X] If the project is accepted, I agree the project will follow the CNCF IP Policy

Trademark and accounts

[X] If the project is accepted, I agree to donate all project trademarks and accounts to the CNCF

Why CNCF?

We see the project as a natural extension of OpenTelemetry, and our hope is that with its maturity it may even fully integrate into OpenTelemetry. By moving this project under the CNCF umbrella, we allow this project to continue to evolve in synergy with OpenTelemetry. Moreover, seeing how OpenTelemetry has changed the cloud observability landscape, providing much needed freedom and flexibility to users, we seek to do the same in the LLM observability domain, which is rapidly evolving but it’s still in its early stages. Our hope is that under the CNCF umbrella it will become easier for other vendors to adopt this as a standard, instead of opting for a proprietary protocol as many do today.

Benefit to the Landscape

OpenLLMetry extends the current CNCF observability landscape into the rapidly evolving gen AI domain. It provides a novel approach for tracing and monitoring LLM components like foundation models and vector databases which couldn’t be done with existing CNCF tools due to inherent limitations. And by basing these capabilities on OpenTelemetry, OpenLLMetry stays true to the CNCF’s mission of providing open standards that any developer can rely upon.

Cloud Native 'Fit'

OpenLLMetry is built with cloud-native technologies like OpenTelemetry and fits in the Observability and Analysis area. Additionally, it’s compatible with the nascent AI TAG forming - and within that TAG, Observability is also seen as an important area for AI.

Cloud Native 'Integration'

OpenLLMetry depends on OpenTelemetry as it extends it and is fully compatible with it. By emitting data as OTLP, OpenLLMetry is thus compatible with a wide array of tools, both open source (e.g., Jaeger) and proprietary.

Cloud Native Overlap

While there is an apparent overlap with OpenTelemetry, as noted before, we see this more of an extension and a complement to OpenTelemetry. There were some extensions that were mandatory for OpenLLMetry to work properly which couldn’t be implemented directly in OpenTelemetry given its scale and usage. For example, adding full prompts to spans makes sense for OpenLLMetry given the scale and number of traces per minute is much lower than a microservice application. Additionally, the AI space is moving at an incredibly fast pace today, necessitating broad and sweeping changes in the OpenLLMetry project if/when things change in upstream components. Because the OpenTelemetry project is seeking greater stability across all its components, this need to made rapid and broad changes may conflict with the current focus of many OpenTelemetry projects.

Similar projects

LangSmith - LangChain's proprietary platform for LLM observability. Natively integrates with the open-source LangChain framework for building LLM applications, but otherwise requires manually logging and tracing.

LangFuse - Open source platform for LLM observability. Uses a proprietary protocol, and does not support auto-instrumentation.

Arize Phoenix - More of an MLOps project, but does use OpenTelemetry code repurposed to fit a proprietary protocol today.

Landscape

Yes

Business Product or Service to Project separation

Traceloop, the product we’re building, is a destination for the OpenLLMetry SDK. It uses traces and metrics to provide its users with tools to evaluate the quality of model outputs and iterate on changes they’re making to their applications. Similar to other destinations that natively integrate with OpenTelemetry like Honeycomb, Lightstep, Splunk, Sentry, and others.

Project presentations

The project was presented to TAG Observability on 5/12/2023 (cc @halcyondude) https://youtu.be/ksmPWR_ZybE?si=Z10pzfNf2QIDyZ3J

Project champions

Chris Aniszczyk

Additional information

No response

This project provides good functionalities. I would like to see the long-term roadmaps but the roadmap link does not work. I see development is ongoing. Could you provide information on how adoption is going as well as any plan for community growth beyond one company's contribution? I do not see a governance document. Could you add that? Have you presented it to the Tag-observability as stated in the "Project presentations" section? If so, what is the TAG feedback for your presentation?

Thanks @cathyhongzhang. Changed the roadmap privacy setting and added a governance doc.

I just presented on the TAG-observability a few hours ago, and will update here with the presentation and comments.

Can you explain why OpenLLMetry should be an autonomous CNCF project, rather than a subproject of OpenTelemetry?

Sure @jberkus. While our initial focus is only tracing and observability for LLM apps, we do want to work on standardizing other LLM related protocols in the near future which we thing closely relate to observability for LLMs. For example, prompt and model configuration. These may not be tied to opentelemetry necessarily.

Feedback re: (this) Sandbox Application
Feedback from TAG Observability Presentation on 12/5
Additional Feedback and Concern(s)
Conclusion

As mentioned above, the project was presented to TAG Observabilty on our 2023-12-05 meeting. There was some discussion and feedback from TAG members, both in the meeting and in subsequent conversations. I'll cover some of this feedback below.

I've also taken some time to review the presentation, materials, this proposal, and the github repositories, and have feedback as well.

The project's goals around addressing an emerging concern around how to observe LLM based applications are clear. I've grouped my feedback into three sections:

response to the Sandbox Application (as proposed)
feedback from the TAG Observabilty Meeting on 12/5
my own hopes and concerns

Feedback re: (this) Sandbox Application

We see the project as a natural extension of OpenTelemetry, and our hope is that with its maturity it may even fully integrate into OpenTelemetry.

If this project is an extension to open-telemetry, then success as defined would obviate the need for a large portion of the project upon it's integration with open-telemetry (either via one of the project's *-contrib repositories, open-telemetry/semantic-conventions, or elsewhere).

By moving this project under the CNCF umbrella, we allow this project to continue to evolve in synergy with OpenTelemetry.

Entering the CNCF Sandbox is not requisite for evolution and/or integration with open-telemetry, nor is that laudable goal itself a reason to join.

Our hope is that under the CNCF umbrella it will become easier for other vendors to adopt this as a standard, instead of opting for a proprietary protocol as many do today.

Vendor and End User community adoption will require engaging with those communities, and welcoming them to participate in the project. Presently the bulk of the contributions have come from two TraceLoop employees (CEO, CTO). Moreover there's an implication that Vendors (today) do not find it easy to adopt the set of conventions because the project isn't in the Sandbox. One would not would expect them to adopt because of Sandbox membership, they might adopt the proposed open-telemetry Semantic Conventions if the project worked with open-telemetry to land them.

OpenLLMetry extends the current CNCF observability landscape into the rapidly evolving gen AI domain. It provides a novel approach for tracing and monitoring LLM components like foundation models and vector databases which couldn’t be done with existing CNCF tools due to inherent limitations.

What's referred to as "inherent limitations" is an open PR (https://github.com/open-telemetry/oteps/pull/234) that's seen substantive and constructive feedback from the project which have been argued with in some cases (without resolution) or are as yet unaddressed.

... And by basing these capabilities on OpenTelemetry, OpenLLMetry stays true to the CNCF’s mission of providing open standards that any developer can rely upon.

The CNCF's mission isn't (specifically) to provide reliable open standards, although that is an outcome with projects like open-telemetry. The API's and Semantic Conventions provided by otel are widely and broadly adopted because they are the result of a consensus based process which has actively solicited feedback from, and has over time consistently engaged with Vendors and End Users; the former as they comprise a large portion of the engineering contribution(s), and the latter by being modelled as "the customer," with active community engagement and by demonstrating responsive feedback and investments in open, community curated and driven documentation.

OpenLLMetry is built with cloud-native technologies like OpenTelemetry and fits in the Observability and Analysis area. Additionally, it’s compatible with the nascent AI TAG forming - and within that TAG, Observability is also seen as an important area for AI.

TAG Observability and TAG Runtime are the hosting/supporting TAG's for the proposed AI working group, as this emerging area is within scope for each TAG's charter and the Working Group's focus bridges the 2 core domains.

OpenLLMetry depends on OpenTelemetry as it extends it and is fully compatible with it. By emitting data as OTLP, OpenLLMetry is thus compatible with a wide array of tools, both open source (e.g., Jaeger) and proprietary.

OpenLLMetry is effectively carrying unmerged, unaccepted patches of open-telemetry's semantic conventions, making it not "fully compatible" with open-telemetry. It's unclear if it's a project goal to work with the open-telemetry community to land them, or to integrate as a plugin or extension, or to simply act as a de facto perpetual fork. I would encourage the OpenLLMetry project to engage with open-telemetry. The referenced pull request is a conversation in that dialog and is a great start!

While there is an apparent overlap with OpenTelemetry, as noted before, we see this more of an extension and a complement to OpenTelemetry.

This isn't consistent with the presentation to TAG Observability (see slide)

There were some extensions that were mandatory for OpenLLMetry to work properly which couldn’t be implemented directly in OpenTelemetry given its scale and usage. For example, adding full prompts to spans makes sense for OpenLLMetry given the scale and number of traces per minute is much lower than a microservice application.

The design descision(s) mentioned above might best be discussed with the community. In the TAG meeting there were a few suggestions for other ways to acheive the goals. I also would like to understand some of the rationale behind the statement above, which was reiterated in the TAG meeting. It was suggested that because the number of spans present in these applications is so much lower than is typical for open-telemetry, the requirements it imposes around payload sizes for trace spans "don't apply." If the TraceLoop SDK (OpenLLMetry OSS SDK is named after TraceLoop (company) not the project) were used to instrument cloud native LLM applications - as are in-scope for the AI working group, why would the scale be so much lower? Given the rapid growth of LLM and AI cloud native applications and services, one might expect the traffic volumes (and number of spans generated) to be correspondingly large.

Additionally, the AI space is moving at an incredibly fast pace today, necessitating broad and sweeping changes in the OpenLLMetry project if/when things change in upstream components. Because the OpenTelemetry project is seeking greater stability across all its components, this need to made rapid and broad changes may conflict with the current focus of many OpenTelemetry projects.

If I'm understanding what was presented on 12/5 at TAG Observability correctly, this need is driven by the approach the project is taking to carrying patches for not just open-telemetry but also for the other targeted (growing) list of integrations. I would like to understand the nature of the engagement and current state of the discussion(s) for the projects being instrumented by the TraceLoop SDK. Is it collaborative? Are they aware? Do they support the project and its approach?

Traceloop, the product we’re building, is a destination for the OpenLLMetry SDK. It uses traces and metrics to provide its users with tools to evaluate the quality of model outputs and iterate on changes they’re making to their applications. Similar to other destinations that natively integrate with OpenTelemetry like Honeycomb, Lightstep, Splunk, Sentry, and others.

Without the TraceLoop product, how does one leverage the generated data data effectively (irrespective of using the OLTP protocol), as it diverges from open-telemetry's Semantic Conventions? If other open-telemetry exporters were to receive OLTP data generated by OpenLLMetry, since payloads for trace headers contain payloads larger than is supported, and employ unsupported Semantic Conventions, what's the result?
Why is the SDK (the open source project) named "TraceLoop SDK"?
Would you please confirm that the TraceLoop platform, and it's UI, are not open?
While the Vendors referenced are destinations for the storage and analysis of observability signal data, their SDK's are not proposed as Sandbox projects, again they work with open-telemetry and standardize on Semantic Conventions.

Feedback from TAG Observability Presentation on 12/5

The TraceLoop Platform UI code calls OpenAI, and uses OpenAI, not an open platform nor an model licensed with an OSI approved license as it's primary example in the Quickstart.
Some of the things being parked in a Trace Span might better be surfaced via structured logs or other signal types, with correlated and/or blending of signals via and open-telemetry collector processor, or other means.
Question about redaction of information from trace spans. In following up, it appears that redaction is provided by the commercial product, not the OSS project, via a TraceLoop API (outside the OSS project). https://www.traceloop.com/docs/openllmetry/privacy/traces: "We have an API to enable content tracing for specific users or workflows. See the Traceloop API documentation for more information." is expressed the OpenLLMetry documentation for the (open) TraceLoop SDK. This links to the closed-source Dashboard API (https://www.traceloop.com/docs/api-reference/tracing/whitelist_user)

Additional Feedback and Concern(s)
The project name is potentially misleading and confusing. Against the backdrop of OpenCensus + OpenTracing = open-telemetry, the OpenLLMetry name, while clever, could easily be construed by End Users as a subproject of open-telemetry.
As articulated above, I would like to see the project engage more deeply and substantively with the open-telemetry community - particularly as the issues raised are not trivial.
The documentation for the project should not reside behind a commercial site (https://www.traceloop.com/docs/openllmetry/introduction).
The documentation should cover how and in what cases CNCF End Users should use the SDK absent using TraceLoop's commercial (closed) platform.
The highlighted rationale gives me pause and is for me a source of concern.

Conclusion

I think that this project is forward looking, and aims to address a meaningful emerging function - the observation of LLM application workloads. I would encourage the project's maintainers to continue to engage with open-telemetry and the other libraries that OpenLLMetry is presently carrying patches for.

I would like to see the project return next cycle after addressing the feedback above. I do think the project is valuable and has a bright future! My comments above are intended as constructive feedback to the project that might help to realize that sentiment.

I can certainly speak towards this:

Without the TraceLoop product, how does one leverage the generated data data effectively (irrespective of using the OLTP protocol), as it diverges from open-telemetry's Semantic Conventions? If other open-telemetry exporters were to receive OLTP data generated by OpenLLMetry, since payloads for trace headers contain payloads larger than is supported, and employ unsupported Semantic Conventions, what's the result?

I have four thoughts here:

The size of payload is not something specified by OpenTelemetry. You can trivially create OTLP trace data that a variety of backends will fail to accept today by either having a lot of custom fields or several fields with a lot of "heavy" data (like 1KB strings). Nothing in OpenTelemetry states that you should limit your number of attribites/fields, nor that they must be under a certain size.
OpenTelemetry defines arbitrary key-value pairs as attributes that go beyond Semantic Conventions. And so the data defined by OpenLLMetry is no different than this, it's just using a package that tries to centralize some domain-specific names and values that are common across several different AI providers.
Not all OTLP backends support the entirety of the OpenTelemetry spec. Some only support metrics, but not traces. Some support special handling of HTTP semantic conventions, but not DB semantic conventions.
There is no requirement to export OTLP data with names and values defined by specified by Semantic Conventions. You can still "send otel" or "accept otel" without any kind of support for the semantic conventions.

As it stands, I think the statements OpenLLMetry made still hold up here, and I can confirm in my own use of the project that it's compatible with at least one OTLP backend that I use.

@halcyondude thanks for this extremely helpful feedback! I'd love to address some of the issues you mentioned and some of your concerns, in addition to the comments @cartermp wrote above.

Vendor and End User community adoption will require engaging with those communities, and welcoming them to participate in the project. Presently the bulk of the contributions have come from two TraceLoop employees (CEO, CTO). Moreover there's an implication that Vendors (today) do not find it easy to adopt the set of conventions because the project isn't in the Sandbox. One would not would expect them to adopt because of Sandbox membership, they might adopt the proposed open-telemetry Semantic Conventions if the project worked with open-telemetry to land them.

We are actively engaging with major vendors (Like @cartermp from Honeycomb, Dynatrace, SigNoz, New Relic and others) to make sure that OpenLLMetry is compatible and stays compatible. We also maintain a list of supported vendors and actively test that support.

Why is the SDK (the open source project) named "TraceLoop SDK"?

Would you please confirm that the TraceLoop platform, and it's UI, are not open?

The documentation for the project should not reside behind a commercial site (https://www.traceloop.com/docs/openllmetry/introduction).

These were decisions that we made earlier in the development, and will of course change when and if CNCF decides this fits under its umbrella. The docs are under our domain for convenience reasons only (small startup, etc.). There's a clear separation between OpenLLMetry docs and Traceloop docs. Traceloop is indeed a closed-source platform.

If I'm understanding what was presented on 12/5 at TAG Observability correctly, this need is driven by the approach the project is taking to carrying patches for not just open-telemetry but also for the other targeted (growing) list of integrations. I would like to understand the nature of the engagement and current state of the discussion(s) for the projects being instrumented by the TraceLoop SDK. Is it collaborative? Are they aware? Do they support the project and its approach?

Definitely. We are working with LLM app frameworks, vector DBs and foundation models like LlamaIndex, LiteLLM, Chroma DB and others to build and maintain promote and get feedback on our instrumentations.

Question about redaction of information from trace spans. In following up, it appears that redaction is provided by the commercial product, not the OSS project, via a TraceLoop API (outside the OSS project). https://www.traceloop.com/docs/openllmetry/privacy/traces: "We have an API to enable content tracing for specific users or workflows. See the Traceloop API documentation for more information." is expressed the OpenLLMetry documentation for the (open) TraceLoop SDK. This links to the closed-source Dashboard API (https://www.traceloop.com/docs/api-reference/tracing/whitelist_user)

That is inaccurate (I'll work on clarifying the docs on our end). You can choose to disable or enable content tracing on the SDK. And you can also choose to enable it selectively using the SDK. We also provide a convenient way to control that through the platform, but it's addition to the aforementioned capabilities.

The documentation should cover how and in what cases CNCF End Users should use the SDK absent using TraceLoop's commercial (closed) platform.

This is clearly stated across our repo README, our docs, and the main website.

If I'm understanding what was presented on 12/5 at TAG Observability correctly, this need is driven by the approach the project is taking to carrying patches for not just open-telemetry but also for the other targeted (growing) list of integrations. I would like to understand the nature of the engagement and current state of the discussion(s) for the projects being instrumented by the TraceLoop SDK. Is it collaborative? Are they aware? Do they support the project and its approach?

Definitely. We are working with LLM app frameworks, vector DBs and foundation models like LlamaIndex, LiteLLM, Chroma DB and others to build and maintain promote and get feedback on our instrumentations.

Excellent! The open-telemetry project has a lot of experience working with other projects as well as commercial solutions to provide instrumentation. Another example of the open source observability community engaging with Vendors can be found in Prometheus's curation of a set of metrics exporters.

Both of these CNCF project communities have experience in navigating some of the technical and cross-project (and indeed cross-product) conversations and solution finding. I would encourage the project to engage, TAG Observability is a great place to do this, as it's where End Users, Vendors, and Project maintainers come together. The TAG and its WG's (e.g. AI working group referenced) are great places to collaboratively engage with the community in ideation, solution finding, and coordination of effort(s).

One more question: given that a lot of LLMs are not open source, have you checked that their SDKs are sufficiently open source that they don't create licensing problems for your project?

@jberkus their SDKs are open source nonetheless (see for example OpenAI, Anthropic, Cohere, Pinecone).

@nirga I think that it would help those learning about the project if there were a few tables in the documentation in the repo (.md)

SDK's and their license
projects for which OpenLLMetry is carrying patches or other custom code to work around/with other projects
projects that are importing and using the OpenLLMetry project

If these summarized the integration and it's type (with links to issues in openllmetry or in other projects), it would help new contributors or others that are evaluating the project to engage both with OpenLLMetry but also the broader, and rapidly growing community around LLM's.

@halcyondude thanks, I agree. Will do that

We would like the project to reapply in 6 months and complete the following

Engage with and present to the AI/ML working group
Bring resolution to the TAG observability recommendations provided previously and re-engage the TAG for confirmation of resolution.

Closing, project can reapply for June (or later) review

cncf / sandbox

[Sandbox] OpenLLMetry #67

Application contact emails

Project Summary

Project Description

Org repo URL (provide if all repos under the org are in scope of the application)

Project repo URL in scope of application

Additional repos in scope of the application

Website URL

Roadmap

Roadmap context

Contributing Guide

Code of Conduct (CoC)

Adopters

Contributing or Sponsoring Org

Maintainers file

IP Policy

Trademark and accounts

Why CNCF?

Benefit to the Landscape

Cloud Native 'Fit'

Cloud Native 'Integration'

Cloud Native Overlap

Similar projects

Landscape

Business Product or Service to Project separation

Project presentations

Project champions

Additional information

Feedback re: (this) Sandbox Application

Feedback from TAG Observability Presentation on 12/5

Additional Feedback and Concern(s)

Conclusion