awslabs / agent-evaluation

A generative AI-powered framework for testing virtual agents.
https://awslabs.github.io/agent-evaluation/
Apache License 2.0
118 stars 20 forks source link

Address inconsistent evaluation results #24

Closed tonykchen closed 7 months ago

tonykchen commented 8 months ago

The current evaluation flow will sometimes produce inconsistent evaluation result. To address this, we may have to further break down the generation tasks and introduce exit points during the flow.