abacaj / code-eval

Run evaluation on LLMs using human-eval benchmark
MIT License
379 stars 36 forks source link

where the evaluate_functional_correctness #17

Open invade-art opened 3 months ago