[R-254] Issue in Evaluation using local LLM

sheetalkamthe55 commented 1 month ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question

“WARNING:ragas.llms.output_parser:Failed to parse output. Returning None.”

I tried including the trace using Langsmith to check for requests and responses. For the given input prompt I believe it is an issue of context length because I get a blank response. I tried different LLMs but the error remains the same.

Code Examples Hosted LLama 2 model with LLamaCPP. Below is the command I used python3 -m llama_cpp.server --model /tmp/llama_index/models/llama-13b.Q5_K_M.gguf --port 8009 --host 129.69.217.24 --chat_format llama-2

Following is a sample testset I am using, Ragas_dataset.csv

Can ignore the dataset part, I tried the same with fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval") but same issue persists.

Code:

from datasets import load_dataset
dataset = load_dataset("csv", data_files="Ragas_dataset.csv")

from tqdm import tqdm
import pandas as pd
from datasets import Dataset
def create_ragas_dataset( eval_dataset):
  rag_dataset = []
  for row in tqdm(eval_dataset):
    rag_dataset.append(
        {"question" : row["question"],
         # "answer" : result["answer"],
         "answer" : row["ground_truth"],
         "contexts" : [row["contexts"]],
         "ground_truth" : row["ground_truth"]
         }
    )
  rag_df = pd.DataFrame(rag_dataset)
  rag_eval_dataset = Dataset.from_pandas(rag_df)
  return rag_eval_dataset

basic_qa_ragas_dataset = create_ragas_dataset(dataset["train"].select(range(2)))

from langchain.chat_models import ChatOpenAI
from ragas.llms import LangchainLLMWrapper

inference_server_url = "http://localhost:8009/v1"

chat = ChatOpenAI(
    model="/tmp/llama_index/models/llama-13b.Q5_K_M.gguf",
    openai_api_key="no-key",
    openai_api_base=inference_server_url,
    max_tokens=5,
    temperature=0,
)

vllm = LangchainLLMWrapper(chat)

from ragas.metrics import (
    context_precision,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness

# change the LLM

faithfulness.llm = vllm
context_precision.llm = vllm
context_recall.llm = vllm
harmfulness.llm = vllm

from ragas import evaluate

result = evaluate(
    basic_qa_ragas_dataset, 
    metrics=[faithfulness]
)
result

Additional context Please let me know if I should provide more information

_R-254

pauljaspersahr commented 1 month ago

Having the same issue with langchain Ollama llm wrapper and llama3

visionKinger commented 2 weeks ago

Have the same situation with Langchan Azure OpenAI

explodinggradients / ragas

[R-254] Issue in Evaluation using local LLM #955