Open zhi-jiao opened 2 months ago
Hey @zhi-jiao! š I'm here to help you with debugging, answering questions, and becoming a contributor. Let's squash those bugs together!
There are several similar issues related to the ValueError: a cannot be empty unless no samples are taken
error. Here are some relevant discussions and solutions:
Issue #871 (open):
docstore
contains enough nodes to match the test_size
. It also recommends checking that the documents
list is not empty before passing it to the TestsetGenerator
[1].Issue #1194 (open):
docstore
contains enough nodes to match the test_size
and to verify that documents are loaded successfully [2].Issue #1109 (open):
documents
list is not empty before passing it to the TestsetGenerator
[3].Issue #625 (closed):
To resolve your issue, ensure that the documents
list is not empty and that the docstore
contains enough nodes to match the test_size
. Here are some steps you can take:
Check if Documents are Loaded:
loader = DirectoryLoader("./data")
documents = loader.load()
# Check if documents are loaded
if not documents:
raise ValueError("No documents were loaded. Please check your file path or parameters.")
print("Number of documents loaded:", len(documents))
Ensure Enough Nodes in Docstore:
current_nodes = [
CurrentNodes(root_node=n, nodes=[n])
for n in self.docstore.get_random_nodes(k=test_size)
]
if len(current_nodes) < test_size:
raise ValueError("Not enough nodes in the docstore to generate the test set.")
By implementing these checks, you can ensure that you do not attempt to generate a test set with an empty list of documents or insufficient nodes in the docstore
.
To continue talking to Dosu, mention @dosu.
I have added the code you implemented, here is the output:
Number of documents loaded: 1
Filename and doc_id are the same for all nodes.
[]
Now I'm sure that all the data have been loaded.
However, the bug occurred in the code below:
current_nodes = [
CurrentNodes(root_node=n, nodes=[n])
for n in self.docstore.get_random_nodes(k=test_size)
]
I can't check wether there are enough nodes in Docstore.
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug "ValueError: a cannot be empty unless no samples are taken"
Ragas version: 0.1.16 Python version: 3.11.9
Code to Reproduce
Error trace
Additional context I have checked the data used in this code, the node list is empty but the docs has been read.