Aider-AI / aider

aider is AI pair programming in your terminal
https://aider.chat/
Apache License 2.0
21.99k stars 2.05k forks source link

[Discussion] Genie has the highest score in the world on SWE-Bench #1065

Closed therealtimex closed 3 months ago

therealtimex commented 3 months ago

Issue

I came across https://cosine.sh/. Their SWE approach is promising. I think it's inspiring and Aider can learn from it.

Here's a summary of Genie:

Genie is a groundbreaking AI software engineering model developed by Cosine, a human reasoning lab. It has achieved remarkable performance on industry-standard benchmarks:

Key Features

Unique Approach

Cosine trained Genie on data that codifies human reasoning derived from real examples of software engineers working. This data represents:

Demonstration

In a demo, Genie solved a real GitHub issue in 84 seconds by:

  1. Fetching the issue
  2. Retrieving relevant files
  3. Writing and iteratively improving code
  4. Using debugging tools
  5. Trying multiple approaches
  6. Opening a PR with title and description

Genie represents a significant advancement in AI-driven software development, demonstrating human-like problem-solving capabilities and efficiency in tackling complex coding tasks. It could be an interesting comparison point or inspiration for the Aider project, showcasing the potential of AI in software engineering.

Version and model info

No response

paul-gauthier commented 3 months ago

Thanks for filing this issue. I saw their announcements and reviewed their SWE Bench submission. They didn't provide much detail beyond hand waving descriptions, and they refused to show their trajectories to the SWE Bench team. So it's pretty hard to have confidence in their result or conclude much about their approach.

therealtimex commented 3 months ago

I also noticed that they didn't include Aider in the benchmark.

paul-gauthier commented 3 months ago

I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time.