Significant-Gravitas / AutoGPT-Code-Ability

🖥️ AutoGPT's Coding Ability - empowering everyone to build software using AI
MIT License
100 stars 26 forks source link

Benchmarking with SWE-Bench (or its "Lite" version) #296

Open BradKML opened 4 weeks ago

BradKML commented 4 weeks ago

Currently there is this benchmark that is designed for full-on repo fixing https://www.swebench.com/ It is used for other software such as OpenDevin, AutoCodeRover, Aider, and SWE-Agent.

Reference for testing: https://github.com/aorwall/SWE-bench-docker