codemirror / codemirror5

In-browser code editor (version 5, legacy)
http://codemirror.net/5/
MIT License
26.82k stars 4.97k forks source link

Wrong diff in merge view #7106

Open cjayyy opened 1 month ago

cjayyy commented 1 month ago

Hi,

I ran into weird diff behavior when it shows change for non-changed line when it follows new line ending with comma. I tried to reproduce the issue in demo of underlying library (here) but with no success, therefore I think the problem might be with implementation to CM5.

https://github.com/user-attachments/assets/d3679517-64ee-43a9-9f44-a2db89a952fa

Can you please confirm the bug and hopefully suggest fix?

Thanks!

marijnh commented 1 month ago

Can you please confirm the bug

Without the inputs that lead to it, probably not.

tomasfejfar commented 1 month ago

It happened to me with the following:

Left:

CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    "Review_Count",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;

Right:

CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
    HAVING
        COUNT("Airline_Reviews"."id") > 1
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;

The Right line 27 with "Rating_StdDev", is incorrectly show as added.

tomasfejfar commented 1 month ago

image

var target = document.getElementById("view");
  target.innerHTML = "";
dv = CodeMirror.MergeView(target, {
    value: `CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    "Review_Count",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;`,
    origLeft: panes == 3 ? orig1 : null,
    orig: `CREATE OR REPLACE TABLE "Origin_Destination_Average_Rating" AS
WITH Aggregated_Reviews AS (
    SELECT 
        "Review_Routes"."origin" AS "Origin_Airport",
        "Review_Routes"."destination" AS "Destination_Airport",
        AVG("Airline_Reviews"."Overall_Rating"::integer) AS "Average_Rating",
        STDDEV("Airline_Reviews"."Overall_Rating"::integer) AS "Rating_StdDev",
        COUNT("Airline_Reviews"."id") AS "Review_Count"
    FROM 
        "Airline_Reviews"
    JOIN 
        "Review_Routes" 
    ON 
        "Airline_Reviews"."id" = "Review_Routes"."review_id"
    WHERE
        "Airline_Reviews"."Overall_Rating" != 'n'
    GROUP BY 
        "Review_Routes"."origin", 
        "Review_Routes"."destination"
    HAVING
        COUNT("Airline_Reviews"."id") > 1
)
SELECT 
    "Origin_Airport",
    "Destination_Airport",
    "Average_Rating",
    "Rating_StdDev",
    CASE 
        WHEN "Average_Rating" > "Average_Rating" + 2 * "Rating_StdDev" THEN 'High Outlier'
        WHEN "Average_Rating" < "Average_Rating" - 2 * "Rating_StdDev" THEN 'Low Outlier'
        ELSE 'Normal'
    END AS "Outlier_Status"
FROM 
    Aggregated_Reviews;`,
    lineNumbers: true,
    mode: "text/html",
    highlightDifferences: highlight,
    connect: connect,
    collapseIdentical: collapse
  });
marijnh commented 1 month ago

Oh, I see what's going on. If you look at the green underline, it matches the comma at the end of the line in the shorter document to a comma on the next line in the longer document (which is also a valid diff). There's multiple valid diffs for any given change like this, and this library doesn't really try to influence the diffing library to pick a particular one. As such, the change spans two lines when displayed, even though another diff could fit on a single line. I don't think this is going to change in this version of the library (@codemirror/merge for CM6 should behave better).

tomasfejfar commented 1 month ago

Oh, I see! So the root cause is "misleading highlighting" in our template where the actual diff is not highlighted, but only lines spanning the diff are highlighted. Thanks!

cjayyy commented 1 month ago

You are right. Thank you very much for the clarity!

cjayyy commented 1 month ago

Well, in the end I think the issue is still somewhere else. When trying it in CM6 there is still the same problem. But it doesn't seem to be just correctly showing diff on trailing comma, instead it shows/hides this diff based on the previous unrelated diff. Please see the video - in the beginning the comma on line 25 is not highlighted, but after adding some change to lines 20-21, the comma on line 25 is highlighted now.

https://github.com/user-attachments/assets/f415ffdd-05fd-4e2b-b276-ad1288eabe4a

tomasfejfar commented 1 month ago

But what constitutes the diff is still the diff library responsibility, not he code editor's, isn't it?

cjayyy commented 1 month ago

I don't know as the diff library has lots of options which CodeMirror doesn't propagate. I am not able to reproduce the issue in the diff library demo, therefore I am looking for wrong implementation in CodeMirror.

marijnh commented 1 month ago

But it doesn't seem to be just correctly showing diff on trailing comma, instead it shows/hides this diff based on the previous unrelated diff.

Again, diffs are not deterministic based on input. There's often multiple correct diffs for a given set of documents.

But it seems the line boundary alignment in @codemirror/merge wasn't working as well as I thought it was. Attached patch should help with that.

I am not able to reproduce the issue in the diff library demo, therefore I am looking for wrong implementation in CodeMirror.

I don't think diff_match_patch gives any guarantees on where it locates changes relative to line breaks, so I'm not really sure what 'reproducing the issue in the diff library' would look like.