Closed StellaAthena closed 1 year ago
This repository started out to just combine 2-3 evaluation benchmarks and has since grown a bit. It is just a few weeks old and we might adapt and refactor it a bit when it makes sense as we are continue to develop it.
That makes sense! I've been involved with the development of the EleutherAI harness and lead the BigScience eval workshop so if you want to chat about stuff :)
Hi @StellaAthena, I made this PR to refactor the codebase to build tasks in separate files like you do in lm-evaluation-harness, it should make it easier to build off of. I also updated the acknowledgment section to EleutherAI, and now also mention it in the top of the readme, as I used a similar architecture. Please let me know if you have any comments.
@loubnabnl Apologies for not replying more promptly, but this revamp looks great! I very much look forward to using the library and potentially helping out with the BigCode project in general.
This library seems overly hard-coded, to the point where building off of it would be difficult. Is there a particular reason you decided to use this architecture for your code? Or why a fork of https://github.com/EleutherAI/lm-evaluation-harness or https://github.com/BigScience-Workshop/lm-evaluation-harness wouldn't suit your needs?