Closed rishsriv closed 3 months ago
After evals finish running across checkpoints, this tool visualizes them in a simple scatterplot and also uploads the results to Slack.
In addition to the results, we can also see the individual ids of different runs, and then do a deep dive into them with eval-visualizer.
Lastly, fixes a subtle bug in the uploads of different runs.
After evals finish running across checkpoints, this tool visualizes them in a simple scatterplot and also uploads the results to Slack.
In addition to the results, we can also see the individual ids of different runs, and then do a deep dive into them with eval-visualizer.
Lastly, fixes a subtle bug in the uploads of different runs.