defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
448 stars 47 forks source link

Upload wandb results #164

Closed wongjingping closed 4 weeks ago

wongjingping commented 4 weeks ago

Added notebook for uploading vllm eval results to wandb post-hoc. This notebook will take the results produced by run_checkpoints.sh and run_checkpoints_cot.sh and upload them to the specified run id on wandb. There is 1 weird quirk about wandb where they don't allow you to log to past steps once completed (see forum post). To work around that, we just continue logging from the latest step onwards (which will cause our steps displayed on wandb to be inconsistent with the actual checkpoints, but should be sufficient for comparison reasons). For example, you can see that the eval for each step starts after 1k steps in the screenshot below:

wandb vllm

rishsriv commented 4 weeks ago

YESSSS! Thank you!