New diff approach to making changes has not been very reliable.

carlos-dc commented 10 months ago

Hello team,

I just recently updated to Aider v0.19.1 and I see that it no longer uses SEARCH/REPLACE from python and instead it implements something more closely resembling git diffs.

What I have noticed is for me in my existing project that I have been using aider for it fails to do a valid change basically 100% of the time. What I notice is that it always leave broken pieces of the old code immediately after the new code that it inserts. I have not had a single successful insert since yesterday. Here is an example of what I have been seeing since yesterday.

This is a part of the diff generated for the above change:

```diff
--- /home/coder/dashclicks-app/src/routes/+page.svelte
+++ /home/coder/dashclicks-app/src/routes/+page.svelte
@@ ... @@
-const sendMessageToAssistant = async () => {
+const sendMessageToAssistant = async (): Promise<void> => {
@@ ... @@
-                       const createThreadResponse = await fetch('createThread', { method: 'POST' });
+                       const createThreadResponse = await fetch('/+page', {
+                               method: 'POST',
+                               headers: { 'x-action': 'createThread' }
+                       });
@@ ... @@
-                       await fetch('addMessage', {
+                       await fetch('/+page', {
                                method: 'POST',
+                               headers: {
+                                       'Content-Type': 'application/json',
+                                       'x-action': 'addMessage'
+                               },
                                body: JSON.stringify({ threadId, messageContent })
                        });
@@ ... @@
-                       const startRunResponse = await fetch('startRun', {
+                       const startRunResponse = await fetch('/+page', {
                                method: 'POST',
+                               headers: {
+                                       'Content-Type': 'application/json',
+                                       'x-action': 'startRun'
+                               },
                                body: JSON.stringify({ threadId })
                        });
@@ ... @@
-

You can see that the diff just completely ignored the header lines that were already there when it should have "-" removed them.

This is happening on every single change that aider does in my project.

batmanscode commented 10 months ago

@carlos-dc what LLM are you using? Just curious

carlos-dc commented 10 months ago

@carlos-dc what LLM are you using? Just curious

I am using gpt-4-1106-preview

batmanscode commented 10 months ago

@carlos-dc thanks. Personally I've noticed a drop in quality with the turbo preview. In addition to an increased response time

paul-gauthier commented 10 months ago

Thanks for trying aider and filing this issue. No editing format is going to work 100% reliably with any of the GPT models. I have been using a pair of extensive benchmarking suites to try and make informed, quantitative decisions when implementing and improving the editing formats. But even so, sometimes the LLM will mess up.

So please keep me posted on problems you are seeing. They might hint at possible ways to improve the editing format.

batmanscode commented 10 months ago

That's a great approach @paul-gauthier, appreciate the effort and that's probably the best one can do

Out of curiosity, how has the new turbo preview model been compared to gpt-4? In your benchmarks

Asking because I was using gpt-4-1106-preview when all of a sudden it got ~2x slower and following instructions poorly enough that I had to switch to the regular gpt-4

paul-gauthier commented 10 months ago

On benchmarks gpt-4-1106-preview seems to do better, but my sense is that gpt-4-0613 might actually be more capable at complex coding.

batmanscode commented 9 months ago

On benchmarks gpt-4-1106-preview seems to do better, but my sense is that gpt-4-0613 might actually be more capable at complex coding.

Very interesting, thanks

jimcraner commented 9 months ago

@paul-gauthier:

(First, aider is awesome -- kudos to you for developing it! And thanks for releasing it! :-)

Second: is there a good way for us to provide data for you about this reliability problem?

For 0.18, using 4-turbo and the SEARCH/REPLACE model, aider worked great (Django, standalone python, Laravel, HTML/CSS, HTMX, vanilla JS)

Since upgrading to 0.19, and now 0.20, using the diff edit model, I have had zero successful edits. I've tried on multiple Django projects and a Laravel project.

I'm going to downgrade to 0.18 for now, but if there is any sort of useful information or data that we can provide to you, please let me know. I'd love to help improve the app!

Thanks again!

paul-gauthier commented 9 months ago

@jimcraner thanks for the info on the problems you are having.

You can try the latest version of aider v0.21.0 which has some improvements to the unified diff editing format. Alternatively, you can always run aider with --model gpt-4-1106-preview --edit-format diff to use the old SEARCH/REPLACE edit format with gpt 4 turbo.

I would love any concrete examples you have of editing failures. To be most useful, I need:

Aider version, model, and edit-format settings. Ideally you can just copy all the "announcement" lines when you run aider which report all of this and other helpful info.
A copy of the diffs which failed to apply. You can find these in .aider.chat.history.md.
A copy of the source file that was being updated, or at least the chunk that is mentioned in the failing diffs.

paul-gauthier commented 9 months ago

I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time.

Aider-AI / aider

New diff approach to making changes has not been very reliable. #411