entropy-research / Devon

Devon: An open-source pair programmer
GNU Affero General Public License v3.0
3.16k stars 255 forks source link

Benchmark on SWE-Bench #10

Open distbit0 opened 5 months ago

distbit0 commented 5 months ago

It would be interesting to see the performance on SWE-Bench benchmarks, so that this project can be more clearly differentiated from the increasing number of other coding agents.

akiradev0x commented 5 months ago

Hey Yep! We are working on it right now!

akiradev0x commented 4 months ago

Hey following up this. We just refactored, and standardized environments. This should allow us (and anyone who wants) to run on swe bench.

nashid commented 3 months ago

Have you conducted benchmarks of Devon using the SWE-Bench dataset?

akiradev0x commented 3 months ago

@nashid we have, they were pretty good but incomplete. will be pushing for it next week!

BradKML commented 2 months ago

Would like to see how this goes vs Aider or AutoCodeRover or OpenDevin https://github.com/aorwall/SWE-bench-docker