defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
540 stars 56 forks source link

Post results to Slack after a grid search is completed #182

Closed rishsriv closed 3 months ago

rishsriv commented 3 months ago

After evals finish running across checkpoints, this tool visualizes them in a simple scatterplot and also uploads the results to Slack.

In addition to the results, we can also see the individual ids of different runs, and then do a deep dive into them with eval-visualizer.

Lastly, fixes a subtle bug in the uploads of different runs.