bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
805 stars 215 forks source link

Library seems unnecessarily hardcoded #10

Closed StellaAthena closed 1 year ago

StellaAthena commented 2 years ago

This library seems overly hard-coded, to the point where building off of it would be difficult. Is there a particular reason you decided to use this architecture for your code? Or why a fork of https://github.com/EleutherAI/lm-evaluation-harness or https://github.com/BigScience-Workshop/lm-evaluation-harness wouldn't suit your needs?

lvwerra commented 2 years ago

This repository started out to just combine 2-3 evaluation benchmarks and has since grown a bit. It is just a few weeks old and we might adapt and refactor it a bit when it makes sense as we are continue to develop it.

StellaAthena commented 2 years ago

That makes sense! I've been involved with the development of the EleutherAI harness and lead the BigScience eval workshop so if you want to chat about stuff :)

loubnabnl commented 2 years ago

Hi @StellaAthena, I made this PR to refactor the codebase to build tasks in separate files like you do in lm-evaluation-harness, it should make it easier to build off of. I also updated the acknowledgment section to EleutherAI, and now also mention it in the top of the readme, as I used a similar architecture. Please let me know if you have any comments.

StellaAthena commented 1 year ago

@loubnabnl Apologies for not replying more promptly, but this revamp looks great! I very much look forward to using the library and potentially helping out with the BigCode project in general.