explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.79k stars 669 forks source link

ModuleNotFoundError: No module named 'ragas.langchain' #571

Open xinyang-handalindah opened 8 months ago

xinyang-handalindah commented 8 months ago

Description of the bug Using Google Colab. After running !pip install ragas, unable to import RagasEvaluatorChain from ragas.langchain.evalchain. It was okay last week (v0.0.22).

Ragas version: 0.1.0 Python version: 3.10.12

Code to Reproduce

from ragas.langchain.evalchain import RagasEvaluatorChain
from ragas.metrics import AnswerCorrectness

from ragas.metrics import (
    faithfulness,
    context_precision,
    context_recall
)

# Customise the weight of answer_correctness
answer_correctness = AnswerCorrectness(
    weights = [0.1, 0.9] # 10% factuality and 90% semantic similarity check.
)

def ragas_eval(result):
  """
  Define a chain to evaluate the result in terms of:
  - faithfulness
  - answer correctness
  - context precision
  - context recall

  Then, return the scores.

  """
  metrics = [faithfulness, answer_correctness,
             context_precision, context_recall]
  scores = {}

  for m in metrics:
    eval_result = RagasEvaluatorChain(metric=m)(result)
    scores[f"{m.name}_score"] = round(eval_result[m.name+'_score'],2)
  return scores

Error trace

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-13-d21d7383e19b>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from ragas.langchain import RagasEvaluatorChain
      2 from ragas.metrics import AnswerCorrectness
      3 
      4 from ragas.metrics import (
      5     faithfulness,

ModuleNotFoundError: No module named 'ragas.langchain'

Expected behavior Last Week I was still able to import the "RagasEvaluatorChain" from ragas.langchain.evalchain, but encounter this error today.

shahules786 commented 8 months ago

Hey @xinyang-handalindah , ragas 0.1 does not yet have this feature. We are working on it, for now you have two options

  1. Use ragas natively w/o the chain, in this way, you get all the new capabilities of 0.1 version
  2. reinstall and use 0.0.22

Also mentioned in #567

leehanchung commented 7 months ago

Hey @xinyang-handalindah , ragas 0.1 does not yet have this feature. We are working on it, for now you have two options

  1. Use ragas natively w/o the chain, in this way, you get all the new capabilities of 0.1 version
  2. reinstall and use 0.0.22

Also mentioned in #567

Do we have a pathway to migrate to the new API on

  1. Working with Langsmith datasets?
  2. Run batch evaluations
  3. Custom RAGAS evaluators?

Currently here's the guidance on the blogpost.

from langchain.smith import RunEvalConfig, run_on_dataset

evaluation_config = RunEvalConfig(
    custom_evaluators=[eval_chains.values()],
    prediction_key="result",
)

result = run_on_dataset(
    client,
    dataset_name,
    create_qa_chain,
    evaluation=evaluation_config,
    input_mapper=lambda x: x,
)

Now it's completely detached, as shown here

from datasets import load_dataset
from ragas.metrics import context_precision, answer_relevancy, faithfulness
from ragas import evaluate

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")

result = evaluate(
    fiqa_eval["baseline"].select(range(3)),
    metrics=[context_precision, faithfulness, answer_relevancy],
)
jjmachan commented 7 months ago

we are actually tracking this here #567 we want to make sure we add back compatibility to the original use case. Do give us some time and we will get to it.

in the meantime you could

  1. loop through your chain and generate the Dataset for evaluations
  2. use ragas v0.0.22
18abhi89 commented 7 months ago

@jjmachan could you please also point me to the code sample for work around until this bug is fixed?

I am trying to run the example from here with 0.0.22 but not able to

Kirushikesh commented 7 months ago

@jjmachan ,I was facing the same problem, i even tried with ragas v0.0.22 but getting an different error

from ragas.metrics import faithfulness, answer_relevancy, context_relevancy, context_recall
from ragas.langchain.evalchain import RagasEvaluatorChain

# make eval chains
eval_chains = {
    m.name: RagasEvaluatorChain(metric=m) 
    for m in [faithfulness, answer_relevancy, context_relevancy, context_recall]
}
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 5
      2 from ragas.langchain.evalchain import RagasEvaluatorChain
      4 # make eval chains
----> 5 eval_chains = {
      6     m.name: RagasEvaluatorChain(metric=m) 
      7     for m in [faithfulness, answer_relevancy, context_relevancy, context_recall]
      8 }

Cell In[10], line 6, in <dictcomp>(.0)
      2 from ragas.langchain.evalchain import RagasEvaluatorChain
      4 # make eval chains
      5 eval_chains = {
----> 6     m.name: RagasEvaluatorChain(metric=m) 
      7     for m in [faithfulness, answer_relevancy, context_relevancy, context_recall]
      8 }

File ~/.conda/envs/guardrails1/lib/python3.9/site-packages/ragas/langchain/evalchain.py:29, in RagasEvaluatorChain.__init__(self, **kwargs)
     27 def __init__(self, **kwargs: t.Any):
     28     super().__init__(**kwargs)
---> 29     self.metric.init_model()

AttributeError: 'Faithfulness' object has no attribute 'init_model'
dmpiergiacomo commented 7 months ago

Hi @xinyang-handalindah and @Kirushikesh,

I fixed it by installing ragas==0.0.11. LangChain article was written on August 23, 2023, while Ragas v0.0.11 was released the day after that.

If you use v0.0.11 also make sure to remove context_precision as not yet supported back then.

jjmachan commented 7 months ago

@18abhi89, @Kirushikesh @dmpiergiacomo - hey 🙂

I would recommend using the v0.1 for the latest metrics but I know without langchain support your stuck there.

for the time being this notebook has the work around code

basically it is

# run langchain RAG
answers = []
contexts = []

for question in test_questions:
  response = retrieval_augmented_qa_chain.invoke({"question" : question})
  answers.append(response["response"].content)
  contexts.append([context.page_content for context in response["context"]])
# make HF dataset
from datasets import Dataset

response_dataset = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

# now you can run the evaluations

can you try it out and see if it solves the problem?

I'm really sorry about the delay guys - will get this sorted as fast as we can

Kirushikesh commented 7 months ago

@jjmachan, I was getting an different error when executing the above code with non-OpenAI LLM, can you look on that #656

dmpiergiacomo commented 7 months ago

@Kirushikesh I was getting the same error.

Kirushikesh commented 7 months ago

@dmpiergiacomo which LLM are you using and what is the error?

sultf2 commented 7 months ago

@jjmachan I tried to execute your suggestion and I get the following error. Because of the recursion, this cost me ~$490 in API calls before the error message, its using GPT4 not turbo, just to warn others tempted to use this

 File "/Users/faisals/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 516, in <listcomp>
    LLMResult(generations=[res.generations], llm_output=res.llm_output)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1076, in pydantic.main.validate_model
  File "pydantic/fields.py", line 895, in pydantic.fields.ModelField.validate
  File "pydantic/fields.py", line 928, in pydantic.fields.ModelField._validate_sequence_like
  File "pydantic/fields.py", line 1094, in pydantic.fields.ModelField._validate_singleton
  File "pydantic/fields.py", line 895, in pydantic.fields.ModelField.validate
  File "pydantic/fields.py", line 928, in pydantic.fields.ModelField._validate_sequence_like
  File "pydantic/fields.py", line 1094, in pydantic.fields.ModelField._validate_singleton
  File "pydantic/fields.py", line 884, in pydantic.fields.ModelField.validate
  File "pydantic/fields.py", line 1101, in pydantic.fields.ModelField._validate_singleton
  File "pydantic/fields.py", line 1157, in pydantic.fields.ModelField._apply_validators
  File "pydantic/class_validators.py", line 337, in pydantic.class_validators._generic_validator_basic.lambda13
  File "pydantic/main.py", line 684, in pydantic.main.BaseModel.validate
  File "pydantic/main.py", line 304, in pydantic.main.ModelMetaclass.__instancecheck__
  File "<frozen abc>", line 119, in __instancecheck__
RecursionError: maximum recursion depth exceeded in comparison
Traceback (most recent call last):

  File "<ipython-input-14-6bb4ddca3d40>", line 6, in <module>
    testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

  File "/Users/faisals/anaconda3/lib/python3.11/site-packages/ragas/testset/generator.py", line 156, in generate_with_langchain_docs
    return self.generate(

  File "/Users/faisals/anaconda3/lib/python3.11/site-packages/ragas/testset/generator.py", line 249, in generate
    raise ExceptionInRunner()

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass `raise_exceptions=False` incase you want to show only a warning message instead.
solita-jonas commented 5 months ago

I'm getting the same error on Ragas 0.1.7 , is it still nescessery to downgrade to 0.0.22 for this to work?

SuperTyrael commented 5 months ago

Getting the same error on Ragas 0.1.7

Fisseha-Estifanos commented 4 months ago

Hi @xinyang-handalindah and @Kirushikesh,

I fixed it by installing ragas==0.0.11. LangChain article was written on August 23, 2023, while Ragas v0.0.11 was released the day after that.

If you use v0.0.11 also make sure to remove context_precision as not yet supported back then.

You are a saint!

allenwu5 commented 4 months ago

with ragas==0.1.9, I tried below and it works to me:

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness

faithfulness_chain = EvaluatorChain(metric=faithfulness)

sample= {
    "question": "...",
    "answer": 
        "..."
    ,
    "contexts": 
        [
            "...", "..."
        ],
    "ground_truth": "...",
}
eval_result = faithfulness_chain(sample)

where "..." is your data

dqminhv commented 1 month ago

with ragas==0.1.14, I made it work by

from ragas.metrics import faithfulness, answer_relevancy, context_recall, context_precision
from ragas.integrations.langchain import EvaluatorChain

# make eval chains
eval_chains = {
    m.name: EvaluatorChain(metric=m) 
    for m in [faithfulness, answer_relevancy, context_precision, context_recall]
}

The context_relevancy should be changed to context_precision because I do not see any context_relevancy module in ragas folder, and context_precision seems to be the same idea as context_relevancy. Also import the EvaluatorChain, not RagasEvaluatorChain

shahules786 commented 1 month ago

Hey @dqminhv thanks for pitching in my friend. You're right regarding context_relevancy, it was depreciated since 0.1 in favor of context_precision and was removed recently. If you are looking for a reference-free metric to evaluate retrieval accuracy checkout context_utilization

@jjmachan is this issue resolved?