[ ] create test set of at least 10 prompts with their correct answers
[ ] set up pipeline that tests against that test set and reports accuracy
[ ] If a user adds a visualization to their dashboard, then that input, query, result set goes into the training set. The adding to the visualization is the final check in the process that it's correct.
[ ] set up pipeline that tests against that test set and reports accuracy
[ ] If a user adds a visualization to their dashboard, then that input, query, result set goes into the training set. The adding to the visualization is the final check in the process that it's correct.