As current leaders on the SWE-bench Lite Leaderboard, I was curious if you were planning to submit a new version of Aide, and maybe to multiple leaderboards.
Impossible questions
There are a set of impossible questions which cannot be solved by any framework or human. These involve questions where the error needs to be formatted exactly as the test.
SWE-bench Verified might help with that. All 500 questions have been human-validated for quality. It might be that Aide could break the 50% threshold as the first model and go into history this way!
Hi @theskcd!
As current leaders on the SWE-bench Lite Leaderboard, I was curious if you were planning to submit a new version of Aide, and maybe to multiple leaderboards.
In the SWE Bench Lite Analysis blog it was noted that:
SWE-bench Verified might help with that. All 500 questions have been human-validated for quality. It might be that Aide could break the 50% threshold as the first model and go into history this way!