explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.23k stars 741 forks source link

Answer Relevancy with Own LLM: Kernel Dies #345

Closed weissenbacherpwc closed 9 months ago

weissenbacherpwc commented 11 months ago

Hi,

I want to evaluate my RAG application and computing faithfulness, context_precision and context recall with my own LLM (Llama based) works. But whey I try to compute the answer_relevancy score, either the "OpenAI-Key" was not found (I don't want to use OpenAI) or my Kernel dies. This is how my DatasetDict looks like:

DatasetDict({
    train: Dataset({
        features: ['question', 'answer', 'contexts', 'ground_truths'],
        num_rows: 2 
    })
})

I am loading my own embeddings, following the documentation:

from ragas.metrics import AnswerRelevancy
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name='intfloat/multilingual-e5-large',model_kwargs={'device': 'mps'})
answer_relevancy = AnswerRelevancy(
    embeddings=embeddings
)

If I try the following, I get an "OpenAIKeyNotFound"-Error:

#answer_relevancy.llm = llm
# init_model to load models used
answer_relevancy.init_model()

results = answer_relevancy.score(em_mistral_extracted["train"])

When I first try to set answer_relevancy to my specific LLM, my Kernel crashes:

answer_relevancy.llm = llm
# init_model to load models used
answer_relevancy.init_model()

results = answer_relevancy.score(em_mistral_extracted["train"])

Results in: "Kernel crashed while executing code in the the current cell or a previous cell"

At the moment the LLM and the Embeddings are different. When using the same LLM with its embeddings, the same issue occurs.

So any suggestions here?

jjmachan commented 11 months ago

hey @weissenbacherpwc thanks for raising this. so there are 2 issues here right

  1. answer relevancy for custom LLMs: have you fixed that? the code you mentioned should work
  2. kernel error: is this reproducable. I personally experienced this once but I was not able to reproduce it since. Its some bug with how ragas is handling the event loop but if you could tell me how to I can reproduce it I would be able to fix it.
weissenbacherpwc commented 11 months ago

Hi @jjmachan basically there is one broader issue, the kernel only dies when I want to use "answer_relevancy" with my own LLM. Other metrics like "context_recall" or "faithfulness" do work fine. Here is my code for reproduction:

from ragas.llms import LangchainLLM
from src.llm import build_llm

llm = build_llm("/Modelle/em_german_mistral_v01.Q5_K_M.gguf") 
# llm is type: <class 'langchain.llms.llamacpp.LlamaCpp'>
vllm = LangchainLLM(llm=llm)
# vllm is type <class 'ragas.llms.langchain.LangchainLLM'>

from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
    answer_correctness,
    answer_similarity, 
    context_relevancy
)
from ragas.metrics.critique import harmfulness

# change the LLM

faithfulness.llm = vllm
answer_relevancy.llm = vllm
context_precision.llm = vllm
context_recall.llm = vllm
harmfulness.llm = vllm
answer_correctness.llm = vllm
answer_similarity.llm = vllm
context_relevancy.llm = vllm

test = answer_relevancy.score(dataset["train"])

When executing test = answer_relevancy.score(..), the kernel dies. But it works fine for:

dataset = context_recall.score(dataset["train"])

My dataset is a DatasetDict:

DatasetDict({
    train: Dataset({
        features: ['question', 'answer', 'contexts', 'ground_truths'],
        num_rows: 35
    })
})
jjmachan commented 11 months ago

which embedding are you using? and can you tell me which ragas version your using too?

one likely situation is that some local model for embeddings is causing the issue since we have some multithreading/async logic happening. since this is only happening for answer_relevancy it should be something from the embedding side because that is the only metric that uses embeddings

weissenbacherpwc commented 11 months ago

I am using the intfloat/multilingual-e5-large embedding model from Huggingface:

from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name='intfloat/multilingual-e5-large',model_kwargs={'device': 'mps'})
answer_relevancy = AnswerRelevancy(
    embeddings=embeddings
)

I have also tried with a Llama Embedding with the same problem like here: from

langchain.embeddings import LlamaCppEmbeddings
llama = LlamaCppEmbeddings(model_path="/path/to/model.bin")

My Ragas Version: "0.0.21"

jjmachan commented 11 months ago

got it, thank you :) I'll try to figure out a workaround but I'm thinking of how to unblock you as quickly as possible. I'll get back to you as soon as possible

weissenbacherpwc commented 11 months ago

I tried to upgrade Ragas to ragas-0.0.22 without success

shahules786 commented 11 months ago

Hi @weissenbacherpwc , there were some delays in fixing this due to some personal reasons. As a quick fix, you can use an older version of ragas (w/o async support) < v0.020

weissenbacherpwc commented 11 months ago

I tried with version 0.0.19 and got the following error: RuntimeError: asyncio.run() cannot be called from a running event loop. When trying version 0.0.18 I cant import from ragas.llms import LangchainLLM.

So downgrading the version does not seem to solve the issue.

shahules786 commented 11 months ago

Hi @weissenbacherpwc , we just added support to FastEmbed which will work w/o async issues.

from ragas.embeddings import FastEmbedEmbeddings
embedding_model = FastEmbedEmbeddings("BAAI/bge-small-en")

you can find list of supported models here

PS: install ragas from source before trying this out.

weissenbacherpwc commented 11 months ago

Hi @weissenbacherpwc , we just added support to FastEmbed which will work w/o async issues.

from ragas.embeddings import FastEmbedEmbeddings
embedding_model = FastEmbedEmbeddings("BAAI/bge-small-en")

you can find list of supported models here

PS: install ragas from source before trying this out.

So I installed Ragas from source with: pip install "git+https://github.com/explodinggradients/ragas.git" and as well pip install fastembed.

Setting the Embedding works:

model_name = "intfloat/multilingual-e5-large"
embedding_model = FastEmbedEmbeddings(model_name=model_name)

When calling answer relevancy, I get this: AnswerRelevancy(batch_size=15, llm=<ragas.llms.langchain.LangchainLLM object at 0x344ef9450>, name='answer_relevancy', evaluation_mode=<EvaluationMode.qac: 1>, strictness=3, embeddings=FastEmbedEmbeddings(model_name='intfloat/multilingual-e5-large', max_length=512, cache_dir=None, threads=None, doc_embed_type='default', cache_folder=None, _model=<fastembed.embedding.FlagEmbedding object at 0x10755cd90>))

But this has already worked with my own embeddings and my own model. But the problem still occures that my Kernel dies when calling: test = answer_relevancy.score(dataset["train"])

jjmachan commented 10 months ago

I will get back to this but this is a tough bug to crack. can you run the embedding in some sort of server so as to send multiple requests? would you be willing to try that out?

normaljosh commented 10 months ago

I'm having a similar issue and would be willing to try running the embedding on a server, but I'm not clear on what the code would look like on the ragas side. If I had a server running BAAI/bge-base-en-v1.5 and returning embeddings, what ragas embeddings object would I use?

jjmachan commented 9 months ago

hey @weissenbacherpwc and @normaljosh this has been fixed with v0.1 and above - could you try it out?

feel free to reopen incase u run into any other troubles

anirbanpupi commented 8 months ago

I have facing similar issue when I try to generate synthetic dataset using my custom llm and embeddings. I am using TheBloke/Llama-2-13B-chat-GPTQ and BAAI/bge-large-en-v1.5 for llm and embeddings.

fr0zenshard commented 8 months ago

same

SuperYG1991 commented 8 months ago

Kernel dies when evaluate a custom dataset using custom llm and embeddings.

eval_llm = LlamaCpp(
    model_path="./model/Mistral-7B-Instruct-v0.2/mistral-7b-instruct-v0.2.Q8_0.gguf",
    n_ctx=3900,
    n_gpu_layers=-1,
    n_batch=512,
    callback_manager= CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=False,
)

eval_embed = HuggingFaceBgeEmbeddings(model_name='./model/bge-large-en-v1.5',
                                           model_kwargs={'device': 'cuda'},
                                           encode_kwargs={'normalize_embeddings': True})

result = evaluate(
    evalsets,
    metrics=[
        faithfulness, 
        answer_correctness, 
        answer_similarity, 
        answer_relevancy],
    llm=eval_llm,
    embeddings=eval_embed
)
anirbanpupi commented 8 months ago

I have already opened an issue. I have opened the issue almost 2 weeks ago, but the bug is not yet fixed.