Open distbit0 opened 5 months ago
Hey Yep! We are working on it right now!
Hey following up this. We just refactored, and standardized environments. This should allow us (and anyone who wants) to run on swe bench.
Have you conducted benchmarks of Devon using the SWE-Bench dataset?
@nashid we have, they were pretty good but incomplete. will be pushing for it next week!
Would like to see how this goes vs Aider or AutoCodeRover or OpenDevin https://github.com/aorwall/SWE-bench-docker
It would be interesting to see the performance on SWE-Bench benchmarks, so that this project can be more clearly differentiated from the increasing number of other coding agents.
https://www.swebench.com/
https://github.com/princeton-nlp/SWE-bench
https://arxiv.org/abs/2310.06770