Aider-AI / refactor-benchmark

Aider's refactoring benchmark exercises based on popular python repos
Apache License 2.0
46 stars 2 forks source link

Requesting model to test? #3

Open TomLucidor opened 2 weeks ago

TomLucidor commented 2 weeks ago

For both the Code Editing and Refactoring benchmark, it does not include enough LLMs like Phi-3 / Phi-3.1 / Phi-3.5 family of models since they claim to be both small and capable. This issue came up when observing dashboards such as BigCodeBench and EvalPlus.

paul-gauthier commented 2 weeks ago

Thanks for trying aider and filing this issue.

PRs are welcome!

https://aider.chat/docs/leaderboards/#contributing-benchmark-results