codestoryai / swe_bench_traces

Contains the model patches and the eval logs from the passing swe-bench-lite run.
7 stars 2 forks source link

New SWE-bench submission #2

Open EwoutH opened 2 weeks ago

EwoutH commented 2 weeks ago

Hi @theskcd!

As current leaders on the SWE-bench Lite Leaderboard, I was curious if you were planning to submit a new version of Aide, and maybe to multiple leaderboards.

In the SWE Bench Lite Analysis blog it was noted that:

Impossible questions There are a set of impossible questions which cannot be solved by any framework or human. These involve questions where the error needs to be formatted exactly as the test.

SWE-bench Verified might help with that. All 500 questions have been human-validated for quality. It might be that Aide could break the 50% threshold as the first model and go into history this way!

theskcd commented 2 weeks ago

definitely! we have been a bit busy right now with working on the editor side of changes and helping our users (the pain of growth)

but we have our eyes on swe-bench-verified and very keen on testing on it