explodinggradients / ragas

Supercharge Your LLM Application Evaluations šŸš€
https://docs.ragas.io
Apache License 2.0
7.37k stars 749 forks source link

"ValueError: a cannot be empty unless no samples are taken" occurred when I used Local Ollama models #1265

Open zhi-jiao opened 2 months ago

zhi-jiao commented 2 months ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug "ValueError: a cannot be empty unless no samples are taken"

Ragas version: 0.1.16 Python version: 3.11.9

Code to Reproduce

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_community.llms import Ollama
from langchain_community.document_loaders import DirectoryLoader

from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model='nomic-embed-text')
loader = DirectoryLoader("./data")
documents = loader.load()

# generator with openai models
generator_llm = Ollama(model="gemma:2b")
critic_llm = Ollama(model="gemma:2b")
embeddings = OllamaEmbeddings(model='nomic-embed-text')

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

Error trace

Traceback (most recent call last):
  File "/home/zhsu20/Dunhuang_LLM/1.py", line 24, in <module>
    testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhsu20/miniconda3/envs/open-webui-env/lib/python3.11/site-packages/ragas/testset/generator.py", line 211, in generate_with_langchain_docs
    return self.generate(
           ^^^^^^^^^^^^^^
  File "/home/zhsu20/miniconda3/envs/open-webui-env/lib/python3.11/site-packages/ragas/_analytics.py", line 129, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhsu20/miniconda3/envs/open-webui-env/lib/python3.11/site-packages/ragas/testset/generator.py", line 281, in generate
    for n in self.docstore.get_random_nodes(k=test_size)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zhsu20/miniconda3/envs/open-webui-env/lib/python3.11/site-packages/ragas/testset/docstore.py", line 328, in get_random_nodes
    nodes = self.run_config.rng.choice(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "numpy/random/_generator.pyx", line 803, in numpy.random._generator.Generator.choice
ValueError: a cannot be empty unless no samples are taken

Additional context I have checked the data used in this code, the node list is empty but the docs has been read.

dosubot[bot] commented 2 months ago

Hey @zhi-jiao! šŸ‘‹ I'm here to help you with debugging, answering questions, and becoming a contributor. Let's squash those bugs together!

There are several similar issues related to the ValueError: a cannot be empty unless no samples are taken error. Here are some relevant discussions and solutions:

  1. Issue #871 (open):

    • This issue discusses the same error and suggests ensuring that the docstore contains enough nodes to match the test_size. It also recommends checking that the documents list is not empty before passing it to the TestsetGenerator [1].
  2. Issue #1194 (open):

    • The suggested solution is to ensure that the docstore contains enough nodes to match the test_size and to verify that documents are loaded successfully [2].
  3. Issue #1109 (open):

    • This issue also suggests ensuring that the documents list is not empty before passing it to the TestsetGenerator [3].
  4. Issue #625 (closed):

    • This issue was closed by a pull request that fixed an adaptor issue for the keyphrase extractor [4].

To resolve your issue, ensure that the documents list is not empty and that the docstore contains enough nodes to match the test_size. Here are some steps you can take:

  1. Check if Documents are Loaded:

    loader = DirectoryLoader("./data")
    documents = loader.load()
    
    # Check if documents are loaded
    if not documents:
       raise ValueError("No documents were loaded. Please check your file path or parameters.")
    
    print("Number of documents loaded:", len(documents))
  2. Ensure Enough Nodes in Docstore:

    current_nodes = [
       CurrentNodes(root_node=n, nodes=[n])
       for n in self.docstore.get_random_nodes(k=test_size)
    ]
    
    if len(current_nodes) < test_size:
       raise ValueError("Not enough nodes in the docstore to generate the test set.")

By implementing these checks, you can ensure that you do not attempt to generate a test set with an empty list of documents or insufficient nodes in the docstore.

To continue talking to Dosu, mention @dosu.

zhi-jiao commented 2 months ago

I have added the code you implemented, here is the output:

Number of documents loaded: 1
Filename and doc_id are the same for all nodes.                                                                                                                                                       
[]

Now I'm sure that all the data have been loaded.

However, the bug occurred in the code below:

        current_nodes = [
            CurrentNodes(root_node=n, nodes=[n])
            for n in self.docstore.get_random_nodes(k=test_size)
        ]

I can't check wether there are enough nodes in Docstore.