Describe the Feature
The TestsetGenerator currently randomly selects nodes from the docstore with replacement. This means the same node may have multiple testsets generated for it while other nodes have none. Requesting that this be revisited to see if it makes sense to choose nodes from the docstore without replacement.
Why is the feature important for you?
For my usecase, I want to generate a question for each document I provide. This isn't possible without overriding the docstore's get_random_nodes implementation to choose without replacement. E.g.:
import numpy as np
from ragas.testset.docstore import InMemoryDocumentStore, Node
from ragas.testset.utils import rng
class NoReplacementInMemoryDocumentStore(InMemoryDocumentStore):
def get_random_nodes(self, k=1) -> List[Node]:
node_copies = k // len(self.nodes)
remainder = k % len(self.nodes)
selected_nodes = self.nodes * node_copies
if remainder == 0:
return selected_nodes
random_nodes = rng.choice(
np.array(self.nodes),
size=remainder,
replace=False
).tolist()
selected_nodes.append(random_nodes)
return selected_nodes
Describe the Feature The
TestsetGenerator
currently randomly selects nodes from the docstore with replacement. This means the same node may have multiple testsets generated for it while other nodes have none. Requesting that this be revisited to see if it makes sense to choose nodes from the docstore without replacement.Why is the feature important for you? For my usecase, I want to generate a question for each document I provide. This isn't possible without overriding the docstore's
get_random_nodes
implementation to choose without replacement. E.g.: