Open Gwenn-LR opened 2 months ago
To resolve the issue of the ragas.testset.generator.TestsetGenerator
object generating empty rows, you should ensure that the generate
method is correctly initializing and executing the evolutions. Here are a few steps to check and potentially resolve the issue:
Check Document Addition: Ensure that documents are being correctly added to the docstore
. This is done in the generate_with_llamaindex_docs
and generate_with_langchain_docs
methods.
Validate Distributions: Ensure that the distributions passed to the generate
method sum to 1.0. This is validated in the generate
method.
Initialize Evolutions: Ensure that evolutions are correctly initialized in the init_evolution
method.
Execution of Evolutions: Ensure that the evolutions are being correctly executed and that the Executor
is handling the tasks properly.
Here is a snippet of the generate
method that you should review and ensure is functioning correctly:
def generate(
self,
test_size: int,
distributions: t.Optional[Distributions] = None,
with_debugging_logs=False,
is_async: bool = True,
raise_exceptions: bool = True,
run_config: t.Optional[RunConfig] = None,
):
distributions = distributions or DEFAULT_DISTRIBUTION
# validate distributions
if not check_if_sum_is_close(list(distributions.values()), 1.0, 3):
raise ValueError(
f"distributions passed do not sum to 1.0 [got {sum(list(distributions.values()))}]. Please check the "
f"distributions."
)
# configure run_config for docstore
if run_config is None:
run_config = RunConfig(max_retries=15, max_wait=90)
self.docstore.set_run_config(run_config)
# init filters and evolutions
for evolution in distributions:
self.init_evolution(evolution)
evolution.init(is_async=is_async, run_config=run_config)
if with_debugging_logs:
from ragas.utils import patch_logger
patch_logger("ragas.testset.evolutions", logging.DEBUG)
patch_logger("ragas.testset.extractor", logging.DEBUG)
patch_logger("ragas.testset.filters", logging.DEBUG)
patch_logger("ragas.testset.docstore", logging.DEBUG)
patch_logger("ragas.llms.prompt", logging.DEBUG)
exec = Executor(
desc="Generating",
keep_progress_bar=True,
raise_exceptions=raise_exceptions,
run_config=run_config,
)
current_nodes = [
CurrentNodes(root_node=n, nodes=[n])
for n in self.docstore.get_random_nodes(k=test_size)
]
total_evolutions = 0
for evolution, probability in distributions.items():
for i in sample(range(test_size), round(probability * test_size)):
exec.submit(
evolution.evolve,
current_nodes[i],
name=f"{evolution.__class__.__name__}-{i}",
)
total_evolutions += 1
if total_evolutions <= test_size:
filler_evolutions = choices(
list(distributions), k=test_size - total_evolutions
)
for evolution in filler_evolutions:
exec.submit(
evolution.evolve,
current_nodes[total_evolutions],
name=f"{evolution.__class__.__name__}-{total_evolutions}",
)
total_evolutions += 1
try:
test_data_rows = exec.results()
if not test_data_rows:
raise ExceptionInRunner()
except ValueError as e:
raise e
# make sure to ignore any NaNs that might have been returned
# due to failed evolutions. MaxRetriesExceeded is a common reason
test_data_rows = [r for r in test_data_rows if not is_nan(r)]
test_dataset = TestDataset(test_data=test_data_rows)
evol_lang = [get_feature_language(e) for e in distributions]
evol_lang = [e for e in evol_lang if e is not None]
track(
TestsetGenerationEvent(
event_type="testset_generation",
evolution_names=[e.__class__.__name__.lower() for e in distributions],
evolution_percentages=[distributions[e] for e in distributions],
num_rows=len(test_dataset.test_data),
language=evol_lang[0] if len(evol_lang) > 0 else "",
)
)
return test_dataset
Ensure that the Executor
is correctly handling the evolution tasks and that the test_data_rows
are being populated correctly. If the test_data_rows
are empty, it could indicate an issue with the evolution process or the document addition to the docstore
[1].
To continue talking to Dosu, mention @dosu.
[x] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug Hi! I'm currently working with
ragas
to test different RAG architectures, so I'm usingOllama
,HuggingFace
andLangChain
framework on top ofragas
and I'm facing an issue when I'm trying to implement an unit test around the synthetic generation : the generatedragas.testset.generator.TestsetGenerator
object has empty rows.I think it comes from a specific parametrization of each frameworks but after having looked over the repository, I think you've tried to avoid such a situation but your checking does not raise any error.
Ragas version: 0.1.11 Python version: 3.10.12
Code to Reproduce The
example.pdf
file used here can be found at: https://css4.pub/2015/usenix/example.pdfError trace No error, but that is the problem.
Expected behavior According to your code, one can expect to get a
ragas.exceptions.ExceptionInRunner
in such a situation.Additional context I'll offer a PR to fix this issue, but I don't know if it won't conflict with another part of the code.