Jellyfish042 / uncheatable_eval

Evaluating LLMs with Dynamic Data
MIT License
66 stars 4 forks source link

Batch eval #6

Closed Jellyfish042 closed 2 months ago