fix: humaneval prompt and eval updated

empirical-run / empirical

Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application

https://docs.empirical.run

MIT License

149 stars 13 forks source link

Closed arjunattam closed 7 months ago

arjunattam commented 7 months ago

drive by: added a failing test for long running eval script