Arize-ai / open-inference-spec

A specification for OpenInference, a semantic mapping of ML inferences
https://arize-ai.github.io/open-inference-spec/
MIT License
44 stars 2 forks source link

Relation to OpenTelemetry #33

Open tmc opened 11 months ago

tmc commented 11 months ago

Heya! I’m curious how you are thinking about how this effort relates to or interacts with the OpenTelemetry project.

mikeldking commented 11 months ago

Heya @tmc - yes the traces are basically OpenTelemetry with a few different design constraints:

We believe OTEL will become increasingly important but that fine-grain traces like OpenInference Tracing is also important to build first-class generative apps. I think the interaction with OpenTelemetry is not yet fleshed out but I can see a future where these traces can be consumed as part of OTEL's distributed tracing.

janaka commented 10 months ago

Can you elaborate more on why you can't build on top / extend Otel right now? Why is it in the future?

mikeldking commented 10 months ago

Can you elaborate more on why you can't build on top / extend Otel right now? Why is it in the future?

Hey @janaka - good question - it's something we ponder about a lot. I think we mainly want to be deliberate in our use of OTEL since - if we build out with OTEL we conflate APM with some of the "possible" tracing needs of LLM application introspection. OTEL is designed around distributed systems and it's context management is really designed around these boundaries where as with what we've started to inspect, the context is more application specific (like conversational applications, retrieval, etc.). In many ways you need a lot more information than traditional APM because you are dealing with unstructured data like documents and text. To answer it more simply - we are mainly focused on the application specific topology - so we started there. But as you mention, building out on top of OTEL could be a good move since a lot of instrumentation already exists.

I know that doesn't fully answer answer your question but we are focusing on capturing the right set of attributes and plan on supporting OTEL as a follow-up. Hope that helps a bit. Would love to hear your thoughts on the matter.

janaka commented 10 months ago

Hi @mikeldking, thanks for your response.

Yes for sure there are ML specifics needed, no getting away from that. It makes sense that you are focusing on figuring what the model for this domain should look like first. That's the value prop after all. It's also great that you've based it on OTel that was definitely the right move in my view rather than going bespoke.

From a usage point of view, having one system for wiring up the tracing (and metrics and logs) and pushing to different backends that are task/domain specific makes a big difference mentally. Docq is doing RAG so there's the index+search side as well.

Over the last few days I spent some time instrumenting Docq with OTel tracing. The auto instrumentation gets you started OOTB fast but of course need to add application-specific events/spans to make it useful. That's not complicated but I wish Copilot would just add the first round of function decorators for me. Very quickly I've hit the limits of not getting much visibility into LlamaIndex. I created a LlamaIndex callback handler to give me a little more visibility but I don't think it's sufficient. I had a stab at creating a OpenTelemetry instrumentor but had to park that. I think Traceloop are planning to release one for LlamaIndex. Going to see what that looks like.

Right now I think I need end-to-end tracing within the app, so more traditional APM, especially given we are intentionally a single process monolith. Then, want to be able to get more visibility into the RAG pipeline. So both the indexing and then the search/prompt/generation on the usage side (Chat/Q&A/Agents). Evaluations is part of this I feel. No doubt there are development vs production use cases differences. Right now we are more focused on the development time needs.

mikeldking commented 10 months ago

Hi @mikeldking, thanks for your response.

Yes for sure there are ML specifics needed, no getting away from that. It makes sense that you are focusing on figuring what the model for this domain should look like first. That's the value prop after all. It's also great that you've based it on OTel that was definitely the right move in my view rather than going bespoke.

From a usage point of view, having one system for wiring up the tracing (and metrics and logs) and pushing to different backends that are task/domain specific makes a big difference mentally. Docq is doing RAG so there's the index+search side as well.

Over the last few days I spent some time instrumenting Docq with OTel tracing. The auto instrumentation gets you started OOTB fast but of course need to add application-specific events/spans to make it useful. That's not complicated but I wish Copilot would just add the first round of function decorators for me. Very quickly I've hit the limits of not getting much visibility into LlamaIndex. I created a LlamaIndex callback handler to give me a little more visibility but I don't think it's sufficient. I had a stab at creating a OpenTelemetry instrumentor but had to park that. I think Traceloop are planning to release one for LlamaIndex. Going to see what that looks like.

Right now I think I need end-to-end tracing within the app, so more traditional APM, especially given we are intentionally a single process monolith. Then, want to be able to get more visibility into the RAG pipeline. So both the indexing and then the search/prompt/generation on the usage side (Chat/Q&A/Agents). Evaluations is part of this I feel. No doubt there are development vs production use cases differences. Right now we are more focused on the development time needs.

@janaka this is super insightful. Thank you. I think investing in OTEL makes a ton of sense because it gives you maximum visibility across various boundaries - definitely worth the investment. I would never deploy a distributed system without it now. Biggest hurdle as you say is the instrumentation and the need for auto instrumentation for LLM orchestration and LLM providers. I think that's where we are starting to converge on - we have OpenAI instrumentation as well as AWS bedrock instrumentation coming very soon and will tackle the context management to stitch together the spans. At that point I think we will probably figure out how this gets exported to OTEL as well as other collectors like arize-phoenix. Will keep you up to date as we make progress towards end-to-end tracing.

mikeldking commented 8 months ago

Updating we've started moving our instrumentation over to a monorepo that will house OTEL instrumentation (https://github.com/Arize-ai/openinference) - Phoenix now supports OTEL via OTLP so you can send traces to phoenix using OTEL!