Striveworks / valor

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.
https://striveworks.github.io/valor/
Other
38 stars 4 forks source link

Handle bad llm response with retries (llm-guided metrics) #743

Closed bnativi closed 1 month ago

bnativi commented 2 months ago

With the initial text generation metric PR, if an LLM provides an invalid response for one of our LLM guided metrics (wrong formatted, wrong data type, etc.), then Valor will raise an error and the rest of the evaluation will not be completed. This seems like a poor user experience, although we should collect some user feedback on this.

Two improvements could be made:

bnativi commented 2 months ago

PR #728 adds retries that just make the same LLM request as before. If a seed is set (say for OpenAI's API), then this retry logic should return the same mis-formatted response. However if a seed is not sent, then the LLM API's returns can vary from call to call even with the same input, so even just making the exact same call again might result in a valid return.