codestoryai / swe_bench_traces

Contains the model patches and the eval logs from the passing swe-bench-lite run.
7 stars 2 forks source link


 ██████╗ ██████╗ ██████╗ ███████╗███████╗████████╗ ██████╗ ██████╗ ██╗   ██╗
██╔════╝██╔═══██╗██╔══██╗██╔════╝██╔════╝╚══██╔══╝██╔═══██╗██╔══██╗╚██╗ ██╔╝
██║     ██║   ██║██║  ██║█████╗  ███████╗   ██║   ██║   ██║██████╔╝ ╚████╔╝ 
██║     ██║   ██║██║  ██║██╔══╝  ╚════██║   ██║   ██║   ██║██╔══██╗  ╚██╔╝  
╚██████╗╚██████╔╝██████╔╝███████╗███████║   ██║   ╚██████╔╝██║  ██║   ██║   
 ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝╚══════╝   ╚═╝    ╚═════╝ ╚═╝  ╚═╝   ╚═╝   

SWE Bench Lite results: 40.3% SOTA at the time of commit

At CodeStory, we are building Aide, a new age editor made for working along with agents. Unlike AI engineers which throw users out of the loop and chat/copilots which are very much triggerd by humans, we envison an editor where agents and developers come together to hack and collaborate.

At the time of this commit, the agentic framework powering Aide scores 40.3% setting a new benchmark on SWE-Bench-Lite