Open Qin-xb opened 1 month ago
The ValueError: a cannot be empty unless no samples are taken
error occurs when the sample
function from the random
module is called with an empty sequence. This can happen if the test_size
is greater than the number of available nodes in the docstore
.
To resolve this issue, ensure that the docstore
contains enough nodes to match the test_size
. You can add a check to handle cases where there are not enough nodes:
current_nodes = [
CurrentNodes(root_node=n, nodes=[n])
for n in self.docstore.get_random_nodes(k=test_size)
]
if len(current_nodes) < test_size:
raise ValueError("Not enough nodes in the docstore to generate the test set.")
Additionally, ensure that the documents
list is not empty before passing it to the TestsetGenerator
. You can add a check to verify that documents are loaded successfully:
documents = SimpleDirectoryReader(CONFIG.file_path).load_data()
# Check if documents are loaded
if not documents:
raise ValueError("No documents were loaded. Please check your file path or parameters.")
print("load file num:", len(documents))
This will ensure that you do not attempt to generate a test set with an empty list of documents [1][2].
To continue talking to Dosu, mention @dosu.
[ ] I have checked the documentation and related resources and couldn't resolve my bug. ValueError: a cannot be empty unless no samples are taken
Describe the bug A clear and concise description of what the bug is.
Ragas version: Python version:
Code to Reproduce Share code to reproduce the issue
from llama_index.core import SimpleDirectoryReader from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain_openai import ChatOpenAI from langchain_community.embeddings import HuggingFaceBgeEmbeddings
documents = SimpleDirectoryReader(CONFIG.file_path).load_data() print("load file num:", len(documents))
generator_llm = ChatOpenAI( model = CONFIG.generator_llm_model, api_key = CONFIG.generator_llm_api_key, base_url = CONFIG.generator_llm_base_url )
critic_llm = ChatOpenAI( model = CONFIG.critic_llm_model, api_key = CONFIG.critic_llm_api_key, base_url = CONFIG.critic_llm_base_url )
model_kwargs = {"device": CONFIG.embed_model_gpu} encode_kwargs = {"normalize_embeddings": True} embeddings = HuggingFaceBgeEmbeddings( model_name=CONFIG.embed_model, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs ) generator = TestsetGenerator.from_langchain( generator_llm, critic_llm, embeddings )
generate testset
testset = generator.generate_with_llamaindex_docs( documents, test_size=CONFIG.generate_question_size, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, with_debugging_logs=True )
test_df = testset.to_pandas() test_df.to_excel(CONFIG.save_file_path, index=False)
Error trace
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.