[ENHANCEMENT] add the ability to filter spans by document evaluations

Arize-ai / phoenix

AI Observability & Evaluation

https://docs.arize.com/phoenix

Other

3.88k stars 292 forks source link

[ENHANCEMENT] add the ability to filter spans by document evaluations #4089

Open axiomofjoy opened 3 months ago

axiomofjoy commented 3 months ago

Users debugging RAG want to filter their spans to know whether the fault lies with the retriever or the LLM. We need a way for the user to filter document evaluations using our query DSL so they can write filters such as:

document_evals["relevance"].label == "relevant" and evals["qa_correctness"].label == "incorrect"

https://arize-ai.slack.com/archives/C04R3GXC8HK/p1722442515198979?thread_ts=1722436956.324489&cid=C04R3GXC8HK

RogerHYang commented 3 months ago

Because evals for docs and spans are on different spans, this is essentially blocked, unless we can apply a filter for evals on the root span and still include sub-spans (i.e. where the doc evals are) in the calculation of document evaluation summaries.