AbanteAI / mentat

Mentat - The AI Coding Assistant
https://mentat.ai
Apache License 2.0
2.54k stars 234 forks source link

Benchmark on SWE-Bench #561

Open distbit0 opened 5 months ago

distbit0 commented 5 months ago

It would be interesting to see the performance on SWE-Bench benchmarks, so that this project can be more clearly differentiated from the increasing number of other coding agents.

granawkins commented 5 months ago

I actually just did an alpha-ish implementation of this! You can clone the repo and run

python benchmarks/benchmark_runner.py --swe_bench

It's pretty fragile atm and might not work immediately, but ya it is emerging as the benchmark-to-beat so we'll be using it heavily moving forward.