Open alexander-zuev opened 6 days ago
Hey @Twist333d Very valid concerns. 1) We will plan and add support to cost estimation for test generation soon. 2) While efficiently generating ( and regenerating) test data points from a single document set, we also have to make sure that the generated points were not already generated before. So we intend to make this process efficient by doing a one-time preprocessing of documents and then letting you persist with the intermediate form (KG). In that case, one could repeatedly sample data points from the same corpus w/o redoing the preprocessing step. What do you think?
We will be continuously improving the new test gen, and would love to chat and understand more from you. https://cal.com/shahul-ragas/30min
@shahules786 thanks for a prompt response! I think what you describe does indeed address the two core concerns I had:
On the proposal above, how would a user manage the intermediate form? Would I need to save it / manage it locally?
Describe the Feature There are 2 problems with the current dataset generator approach:
Why is the feature important for you?
Additional context Add any other context about the feature you want to share with us.