explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
6.97k stars 698 forks source link

Test dataset Issue #1430

Open nandhu-nr opened 2 weeks ago

nandhu-nr commented 2 weeks ago

WARNING:ragas.testset.docstore:Filename and doc_id are the same for all nodes.

ValueError Traceback (most recent call last) in <cell line: 11>() 9 10 # generate testset ---> 11 testset1 = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, raise_exceptions=False)

3 frames /usr/local/lib/python3.10/dist-packages/ragas/testset/docstore.py in get_random_nodes(self, k, alpha) 325 prob = prob / np.sum(prob) 326 --> 327 nodes = self.run_config.rng.choice( 328 np.array(self.nodes), size=k, p=prob 329 ).tolist()

numpy/random/_generator.pyx in numpy.random._generator.Generator.choice()

ValueError: a cannot be empty unless no samples are taken

dosubot[bot] commented 2 weeks ago

The error you're encountering, ValueError: a cannot be empty unless no samples are taken, is a known issue that occurs when the sample function is called with an empty sequence. This can happen if the test_size is greater than the number of available nodes in the docstore.

To resolve this issue, ensure that the docstore contains enough nodes to match the test_size. You can add a check to handle cases where there are not enough nodes:

current_nodes = [
    CurrentNodes(root_node=n, nodes=[n])
    for n in self.docstore.get_random_nodes(k=test_size)
]

if len(current_nodes) < test_size:
    raise ValueError("Not enough nodes in the docstore to generate the test set.")

Additionally, make sure that the documents list is not empty before passing it to the TestsetGenerator. You can add a check to verify that documents are loaded successfully:

documents = SimpleDirectoryReader(CONFIG.file_path).load_data()

# Check if documents are loaded
if not documents:
    raise ValueError("No documents were loaded. Please check your file path or parameters.")

print("load file num:", len(documents))

This will ensure that you do not attempt to generate a test set with an empty list of documents [1].

To continue talking to Dosu, mention @dosu.