explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.55k stars 643 forks source link

ValueError: a cannot be empty unless no samples are taken #1194

Open Qin-xb opened 1 month ago

Qin-xb commented 1 month ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug. ValueError: a cannot be empty unless no samples are taken

Describe the bug A clear and concise description of what the bug is.

Ragas version: Python version:

Code to Reproduce Share code to reproduce the issue

from llama_index.core import SimpleDirectoryReader from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain_openai import ChatOpenAI from langchain_community.embeddings import HuggingFaceBgeEmbeddings

documents = SimpleDirectoryReader(CONFIG.file_path).load_data() print("load file num:", len(documents))

generator_llm = ChatOpenAI( model = CONFIG.generator_llm_model, api_key = CONFIG.generator_llm_api_key, base_url = CONFIG.generator_llm_base_url )

critic_llm = ChatOpenAI( model = CONFIG.critic_llm_model, api_key = CONFIG.critic_llm_api_key, base_url = CONFIG.critic_llm_base_url )

model_kwargs = {"device": CONFIG.embed_model_gpu} encode_kwargs = {"normalize_embeddings": True} embeddings = HuggingFaceBgeEmbeddings( model_name=CONFIG.embed_model, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs ) generator = TestsetGenerator.from_langchain( generator_llm, critic_llm, embeddings )

generate testset

testset = generator.generate_with_llamaindex_docs( documents, test_size=CONFIG.generate_question_size, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, with_debugging_logs=True )

test_df = testset.to_pandas() test_df.to_excel(CONFIG.save_file_path, index=False)

Error trace

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

dosubot[bot] commented 1 month ago

The ValueError: a cannot be empty unless no samples are taken error occurs when the sample function from the random module is called with an empty sequence. This can happen if the test_size is greater than the number of available nodes in the docstore.

To resolve this issue, ensure that the docstore contains enough nodes to match the test_size. You can add a check to handle cases where there are not enough nodes:

current_nodes = [
    CurrentNodes(root_node=n, nodes=[n])
    for n in self.docstore.get_random_nodes(k=test_size)
]

if len(current_nodes) < test_size:
    raise ValueError("Not enough nodes in the docstore to generate the test set.")

Additionally, ensure that the documents list is not empty before passing it to the TestsetGenerator. You can add a check to verify that documents are loaded successfully:

documents = SimpleDirectoryReader(CONFIG.file_path).load_data()

# Check if documents are loaded
if not documents:
    raise ValueError("No documents were loaded. Please check your file path or parameters.")

print("load file num:", len(documents))

This will ensure that you do not attempt to generate a test set with an empty list of documents [1][2].

To continue talking to Dosu, mention @dosu.