Arize-ai / phoenix

AI Observability & Evaluation
https://docs.arize.com/phoenix
Other
3.45k stars 255 forks source link

Is there any support for hugging face models for RAG EVAL #2140

Closed mujhenahiata closed 7 months ago

dosubot[bot] commented 7 months ago

🤖

Hey @mujhenahiata! Good to see you back here. How's your day going?

Yes, the Arize Phoenix repository does provide support for evaluating Hugging Face models, specifically Retrieval Augmented Generation (RAG). The repository includes a tutorial that demonstrates how to build a RAG pipeline and evaluate it with Phoenix Evals.

The evaluation process includes generating question-context pairs, performing retrieval evaluations, and computing retrieval metrics such as Normalized Discounted Cumulative Gain (NDCG) and precision. The results of the evaluations are then sent to Phoenix for visualization.

Here is a code snippet that installs the necessary packages and sets up the environment for the evaluation:

!pip install -qq "arize-phoenix[experimental,llama-index]>=2.0"

import nest_asyncio
import os
from getpass import getpass
import pandas as pd
import phoenix as px
from llama_index import SimpleDirectoryReader, VectorStoreIndex, set_global_handler
from llama_index.llms import OpenAI
from llama_index.node_parser import SimpleNodeParser

nest_asyncio.apply()

px.launch_app()
set_global_handler("arize_phoenix")

And here is a code snippet that evaluates the RAG system's retrieval performance:

from phoenix.experimental.evals import (
    RelevanceEvaluator,
    run_evals,
)

relevance_evaluator = RelevanceEvaluator(OpenAIModel(model_name="gpt-4-1106-preview"))

retrieved_documents_relevance_df = run_evals(
    evaluators=[relevance_evaluator],
    dataframe=retrieved_documents_df,
    provide_explanation=True,
    concurrency=20,
)[0]

# Aggregate the scores across the retrievals
results = rag_evaluation_dataframe.mean(numeric_only=True)
results

You can find more details in the evaluate_rag.ipynb tutorial in the Arize Phoenix repository.

Please let me know if you need further assistance or have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

amank94 commented 7 months ago

@mujhenahiata You can use LiteLLM to select a model from hugging face for Eval as well. See the usage here: https://docs.arize.com/phoenix/api/evaluation-models#phoenix.experimental.evals.litellmmodel

and LiteLLM resource here https://docs.litellm.ai/docs/providers/huggingface

amank94 commented 7 months ago

Here is a notebook that has an example hugging face model: https://github.com/Arize-ai/phoenix/blob/ff6df1fc01f0986357a9e20e0441a3c15697a5fa/examples/using_llamaindex_with_huggingface_models.ipynb#L4