Closed roaga closed 1 week ago
how much higher is the error rate? hard to tell from these metrics
how much higher is the error rate? hard to tell from these metrics
Good runs: 98 Errored runs: 5 Error rate: 0.04 Errored in root cause: 0 Errored in plan: 3 Error rate in root cause: 0.00 Error rate in plan: 0.02 Error rate in something after plan: 0.02 Runs with unapplied changes: 24 Missing change rate: 0.19
These are the correct numbers, the eval script is bugged @trillville
So actually error rate is unaffected
actually going for haiku now instead of sonnet, evals are better
Model switch from GPT 4o to Claude 3.5 Haiku results in better latency, better coding, and better root cause results.
(see row 2 below):