empirical-run / empirical

Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application
https://docs.empirical.run
MIT License
149 stars 13 forks source link

fix: humaneval prompt and eval updated #20

Closed arjunattam closed 7 months ago

arjunattam commented 7 months ago

drive by: added a failing test for long running eval script