Arize-ai / openinference

OpenTelemetry Instrumentation for AI Observability
https://arize-ai.github.io/openinference/
Apache License 2.0
201 stars 33 forks source link

[bug][llama_index] Not full tracing captured in llama-index #289

Open diicellman opened 6 months ago

diicellman commented 6 months ago

Describe the bug I have an instance of Arize Phoenix running in a Docker container. I've been using the instrument.py example for tracing previously, and there were no problems. Today, I pulled the latest Docker container (3.16.0), and now the tracing captures only "chunking" and nothing more. Previously, it captured the full trace for calling query engines, etc.

Here's my specs: python = 3.10 llama-index = "^0.10.19" openinference-semantic-conventions = "^0.1.5" openinference-instrumentation-llama-index = "^1.2.0" opentelemetry-exporter-otlp = "^1.23.0" llama-index-readers-telegram = "^0.1.4" llama-index-llms-anthropic = "^0.1.6" llama-index-callbacks-arize-phoenix = "^0.1.4" arize-phoenix = {extras = ["evals"], version = "^3.16.0"}

To Reproduce Steps to reproduce the behavior

Expected behavior To see full tracing

trace_details_view

Screenshots This is what I get after running query_engine, only "chunking"

Screenshot 2024-03-18 at 12 26 28

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

mikeldking commented 6 months ago

@diicellman I saw exactly the same regression in llama-index myself. I haven't had time to dive deep into this one but I personally have pinned llma-index to 0.10.19 for the time being. We will investigate further and get back to you but they soft-deprecated callbacks in 0.10.20 - my guess is that this is causing problems.

diicellman commented 6 months ago

Thank you for the reply! I was also thinking that llama-index's callbacks updates could be the cause of the issue.

mikeldking commented 6 months ago

So far we've reproduced orphaned chunking spans if streams are not consumed. This is caused because the chunking spans are emitted as soon as they are created but the overall trace is not "closed"

Screenshot 2024-03-21 at 3 24 57 PM

While not the same as above, it does give us some indication that traces are not shutting down as expected in some situations.

mikeldking commented 6 months ago

After some investigation we don’t think the callbacks are working as they previously did. We tried running the LlamaDebugHandler notebook (https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/callbacks/LlamaDebugHandler.ipynb) and we don’t see any traces being printed anymore.

Screenshot 2024-03-21 at 3 46 06 PM

We are working with the llama-index team to resolve.

mikeldking commented 6 months ago

@diicellman I just hit this issue again with someone else and just want to double check one thing. It's imperative that instrument() is called BEFORE any llama-index initialization happens. I'm not sure this will solve anything for you but just wanted to double check since this is after changing when instrumentation is called:

Before:

Screenshot 2024-03-21 at 4 49 54 PM

After:

Screenshot 2024-03-21 at 5 53 13 PM
diicellman commented 6 months ago

Thank you for your help! I'm calling the instrument() in the main.py fastapi file on application startup before any llamaindex calls.

diicellman commented 6 months ago

I just updated the libraries to the latest version, and I'm encountering the same chunking traces rather than the full ones. I was considering that perhaps the problem is related to asynchronous usage. Both the endpoints in my app and the llama-index's query_engine.aquery() are asynchronous. However, when I made them non-asynchronous, I still got the same chunking traces.

mikeldking commented 6 months ago

I just updated the libraries to the latest version, and I'm encountering the same chunking traces rather than the full ones. I was considering that perhaps the problem is related to asynchronous usage. Both the endpoints in my app and the llama-index's query_engine.aquery() are asynchronous. However, when I made them non-asynchronous, I still got the same chunking traces.

Oh interesting. Is your code on GitHub by chance? Would love to unblock you @diicellman

diicellman commented 6 months ago

Yes, my code is on GitHub, but it's in a private repository. If you need to review the code, I can provide the crucial parts that are necessary. I'm sorry for any inconveniences this may cause.

mikeldking commented 6 months ago

Moving this to our backlog for now. We've communicated the lack of a trace tree in some contexts to llama-index and they are investigating.

mikeldking commented 4 months ago

We will probably fix this via the new instrumentation that is not callbacks