Implemented backend store to track runs, tests, and traces (see store/).
Added new CLI commands to read data from the backend store and also perform required schema upgades.
Updated Markdown summary to display run metadata (e.g. run ID, start/end time).
Modified agenteval.test.Test to be the parent model which contains a TestResult and Exepcted.
Updated test plan config from expected_results to expected, which is now a map. The expected results for a conversation will now be specified using tests.<name>.expected.conversation.
Updated docs with new changes.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
Description of changes:
store/
).agenteval.test.Test
to be the parent model which contains aTestResult
andExepcted
.expected_results
toexpected
, which is now a map. The expected results for a conversation will now be specified usingtests.<name>.expected.conversation
.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.