Closed weissenbacherpwc closed 9 months ago
hey @weissenbacherpwc thanks for raising this. so there are 2 issues here right
Hi @jjmachan basically there is one broader issue, the kernel only dies when I want to use "answer_relevancy" with my own LLM. Other metrics like "context_recall" or "faithfulness" do work fine. Here is my code for reproduction:
from ragas.llms import LangchainLLM
from src.llm import build_llm
llm = build_llm("/Modelle/em_german_mistral_v01.Q5_K_M.gguf")
# llm is type: <class 'langchain.llms.llamacpp.LlamaCpp'>
vllm = LangchainLLM(llm=llm)
# vllm is type <class 'ragas.llms.langchain.LangchainLLM'>
from ragas.metrics import (
context_precision,
answer_relevancy,
faithfulness,
context_recall,
answer_correctness,
answer_similarity,
context_relevancy
)
from ragas.metrics.critique import harmfulness
# change the LLM
faithfulness.llm = vllm
answer_relevancy.llm = vllm
context_precision.llm = vllm
context_recall.llm = vllm
harmfulness.llm = vllm
answer_correctness.llm = vllm
answer_similarity.llm = vllm
context_relevancy.llm = vllm
test = answer_relevancy.score(dataset["train"])
When executing test = answer_relevancy.score(..), the kernel dies. But it works fine for:
dataset = context_recall.score(dataset["train"])
My dataset is a DatasetDict:
DatasetDict({
train: Dataset({
features: ['question', 'answer', 'contexts', 'ground_truths'],
num_rows: 35
})
})
which embedding are you using? and can you tell me which ragas version your using too?
one likely situation is that some local model for embeddings is causing the issue since we have some multithreading/async logic happening. since this is only happening for answer_relevancy it should be something from the embedding side because that is the only metric that uses embeddings
I am using the intfloat/multilingual-e5-large
embedding model from Huggingface:
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name='intfloat/multilingual-e5-large',model_kwargs={'device': 'mps'})
answer_relevancy = AnswerRelevancy(
embeddings=embeddings
)
I have also tried with a Llama Embedding with the same problem like here: from
langchain.embeddings import LlamaCppEmbeddings
llama = LlamaCppEmbeddings(model_path="/path/to/model.bin")
My Ragas Version: "0.0.21"
got it, thank you :) I'll try to figure out a workaround but I'm thinking of how to unblock you as quickly as possible. I'll get back to you as soon as possible
I tried to upgrade Ragas to ragas-0.0.22 without success
Hi @weissenbacherpwc , there were some delays in fixing this due to some personal reasons. As a quick fix, you can use an older version of ragas (w/o async support) < v0.020
I tried with version 0.0.19 and got the following error: RuntimeError: asyncio.run() cannot be called from a running event loop.
When trying version 0.0.18 I cant import from ragas.llms import LangchainLLM
.
So downgrading the version does not seem to solve the issue.
Hi @weissenbacherpwc , we just added support to FastEmbed which will work w/o async issues.
from ragas.embeddings import FastEmbedEmbeddings embedding_model = FastEmbedEmbeddings("BAAI/bge-small-en")
you can find list of supported models here
PS: install ragas from source before trying this out.
So I installed Ragas from source with: pip install "git+https://github.com/explodinggradients/ragas.git"
and as well pip install fastembed
.
Setting the Embedding works:
model_name = "intfloat/multilingual-e5-large"
embedding_model = FastEmbedEmbeddings(model_name=model_name)
When calling answer relevancy, I get this:
AnswerRelevancy(batch_size=15, llm=<ragas.llms.langchain.LangchainLLM object at 0x344ef9450>, name='answer_relevancy', evaluation_mode=<EvaluationMode.qac: 1>, strictness=3, embeddings=FastEmbedEmbeddings(model_name='intfloat/multilingual-e5-large', max_length=512, cache_dir=None, threads=None, doc_embed_type='default', cache_folder=None, _model=<fastembed.embedding.FlagEmbedding object at 0x10755cd90>))
But this has already worked with my own embeddings and my own model.
But the problem still occures that my Kernel dies when calling:
test = answer_relevancy.score(dataset["train"])
I will get back to this but this is a tough bug to crack. can you run the embedding in some sort of server so as to send multiple requests? would you be willing to try that out?
I'm having a similar issue and would be willing to try running the embedding on a server, but I'm not clear on what the code would look like on the ragas side. If I had a server running BAAI/bge-base-en-v1.5 and returning embeddings, what ragas embeddings object would I use?
hey @weissenbacherpwc and @normaljosh this has been fixed with v0.1 and above - could you try it out?
feel free to reopen incase u run into any other troubles
I have facing similar issue when I try to generate synthetic dataset using my custom llm and embeddings. I am using TheBloke/Llama-2-13B-chat-GPTQ and BAAI/bge-large-en-v1.5 for llm and embeddings.
same
Kernel dies when evaluate a custom dataset using custom llm and embeddings.
eval_llm = LlamaCpp(
model_path="./model/Mistral-7B-Instruct-v0.2/mistral-7b-instruct-v0.2.Q8_0.gguf",
n_ctx=3900,
n_gpu_layers=-1,
n_batch=512,
callback_manager= CallbackManager([StreamingStdOutCallbackHandler()]),
verbose=False,
)
eval_embed = HuggingFaceBgeEmbeddings(model_name='./model/bge-large-en-v1.5',
model_kwargs={'device': 'cuda'},
encode_kwargs={'normalize_embeddings': True})
result = evaluate(
evalsets,
metrics=[
faithfulness,
answer_correctness,
answer_similarity,
answer_relevancy],
llm=eval_llm,
embeddings=eval_embed
)
I have already opened an issue. I have opened the issue almost 2 weeks ago, but the bug is not yet fixed.
Hi,
I want to evaluate my RAG application and computing faithfulness, context_precision and context recall with my own LLM (Llama based) works. But whey I try to compute the
answer_relevancy
score, either the "OpenAI-Key" was not found (I don't want to use OpenAI) or my Kernel dies. This is how my DatasetDict looks like:I am loading my own embeddings, following the documentation:
If I try the following, I get an "OpenAIKeyNotFound"-Error:
When I first try to set answer_relevancy to my specific LLM, my Kernel crashes:
Results in: "Kernel crashed while executing code in the the current cell or a previous cell"
At the moment the LLM and the Embeddings are different. When using the same LLM with its embeddings, the same issue occurs.
So any suggestions here?