explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.54k stars 641 forks source link

Cannot generate synthetic data using documented code, gives error #1208

Open dividor opened 3 weeks ago

dividor commented 3 weeks ago

[X ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I am following this RAGAs documentation to generate test data, but I get a module import error.

Ragas version: 0.1.14 Python version: 3.11.4

Code to Reproduce Code from RAGAs documentation ...

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-4o")
critic_llm = ChatOpenAI(model="gpt-4o")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# generate testset
testset = generator.generate_with_langchain_docs(docs, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

Error trace

ImportError                               Traceback (most recent call last)
Cell In[59], [line 1](vscode-notebook-cell:?execution_count=59&line=1)
----> [1](vscode-notebook-cell:?execution_count=59&line=1) from ragas.testset.generator import TestsetGenerator
      [2](vscode-notebook-cell:?execution_count=59&line=2) from ragas.testset.evolutions import simple, reasoning, multi_context
      [3](vscode-notebook-cell:?execution_count=59&line=3) from langchain_openai import ChatOpenAI, OpenAIEmbeddings

File ~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/__init__.py:1
----> [1](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/__init__.py:1) from ragas.testset.generator import TestsetGenerator
      [3](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/__init__.py:3) __all__ = ["TestsetGenerator"]

File ~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:13
     [10](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:10) from langchain_openai.chat_models import ChatOpenAI
     [11](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:11) from langchain_openai.embeddings import OpenAIEmbeddings
---> [13](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:13) from ragas._analytics import TestsetGenerationEvent, track, track_was_completed
     [14](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:14) from ragas.embeddings.base import (
     [15](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:15)     BaseRagasEmbeddings,
     [16](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:16)     LangchainEmbeddingsWrapper,
     [17](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:17)     LlamaIndexEmbeddingsWrapper,
     [18](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:18) )
     [19](https://file+.vscode-resource.vscode-cdn.net/Users/matthewharris/Desktop/git/llm_eval_research/~/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/testset/generator.py:19) from ragas.exceptions import ExceptionInRunner

ImportError: cannot import name 'TestsetGenerationEvent' from 'ragas._analytics' (/Users/matthewharris/opt/miniconda3/envs/eval_demo/lib/python3.11/site-packages/ragas/_analytics.py)

**Expected behavior**
A clear and concise description of what you expected to happen.

Additional context Thanks!

dosubot[bot] commented 3 weeks ago

To resolve the ImportError when trying to generate synthetic data using the RAGAs documentation code with Ragas version 0.1.14 and Python version 3.11.4, you need to ensure that the _version module is available and correctly imported. The provided code snippet from src/ragas/__init__.py shows that the import of _version is wrapped in a try-except block, which sets __version__ to "unknown version" if the import fails.

Here is the relevant part of the code:

try:
    from ._version import version as __version__
except ImportError:
    __version__ = "unknown version"

To fix the ImportError, you should check the following:

  1. Ensure that the _version.py file exists in the ragas directory.
  2. Verify that the _version.py file contains a version variable.

For example, the _version.py file should look something like this:

# src/ragas/_version.py
version = "0.1.14"

If the _version.py file is missing or incorrectly named, create it with the appropriate content. This should resolve the ImportError and allow you to proceed with generating synthetic data using the RAGAs documentation code.

Additionally, ensure you have installed the Ragas package correctly. You can install it using pip:

pip install ragas

Or, if you prefer to install it from the source:

pip install git+https://github.com/explodinggradients/ragas

Here is a small example program from the Ragas documentation to see it in action:

from datasets import Dataset 
import os
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness

os.environ["OPENAI_API_KEY"] = "your-openai-key"

data_samples = {
    'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
    'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
    'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'], 
    ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']],
    'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
}

dataset = Dataset.from_dict(data_samples)

score = evaluate(dataset, metrics=[faithfulness, answer_correctness])
score.to_pandas()

Refer to the Ragas documentation for more details [1][2].

To continue talking to Dosu, mention @dosu.