algesten / jsondiff

Structural JSON diff/patch tool.
Other
66 stars 27 forks source link

Data loss in diff/patch round trip #9

Closed DrLansing closed 12 years ago

DrLansing commented 12 years ago

Input:

    private static String[] more = new String[] {
        "{\"p:timeFrame\":{\"g:id\":\"ID_1bi4uybddb9711i1ih4o3qqwml\",\"g:relatedTime\":[{\"relativePosition\":\"Contains\",\"g:TimeInstant\":{\"g:id\":\"ID_2odwwe6m4fhj1i7uqavgp4mpv\",\"g:identifier\":{\"codeSpace\":\"JP1_02\",\"text\":\"C-day\"},\"g:timePosition\":{\"indeterminatePosition\":\"unknown\"}}},{\"relativePosition\":\"Contains\",\"g:TimeInstant\":{\"g:id\":\"ID_1gu4yx14on411od9yrrwbraia\",\"g:identifier\":{\"codeSpace\":\"JP1_02\",\"text\":\"D-day\"},\"g:relatedTime\":{\"relativePosition\":\"MetBy\",\"g:TimePeriod\":{\"g:id\":\"ID_158yq5cp2z5lm1nugs6byvrjve\",\"g:begin\":{\"x:href\":\"ID_2odwwe6m4fhj1i7uqavgp4mpv\",\"x:title\":\"C-day\"},\"g:end\":{\"nilReason\":\"Unknown\"},\"g:duration\":\"P10D\"}},\"g:timePosition\":{\"indeterminatePosition\":\"unknown\"}}},{\"relativePosition\":\"MetBy\",\"g:TimePeriod\":{\"g:id\":\"ID_190iv1hlow39r1c0gam6p8h02k\",\"g:begin\":{\"x:href\":\"ID_1gu4yx14on411od9yrrwbraia\",\"x:title\":\"D-day\"},\"g:end\":{\"nilReason\":\"Unknown\"},\"g:duration\":\"PT0S\"}}],\"g:beginPosition\":{\"indeterminatePosition\":\"unknown\"},\"g:endPosition\":{\"indeterminatePosition\":\"unknown\"}}}",
        "{\"p:timeFrame\":{\"g:id\":\"ID_1bi4uybddb9711i1ih4o3qqwml\",\"g:relatedTime\":[{\"relativePosition\":\"Contains\",\"g:TimeInstant\":{\"g:id\":\"ID_1gu4yx14on411od9yrrwbraia\",\"g:identifier\":{\"codeSpace\":\"JP1_02\",\"text\":\"D-day\"},\"g:timePosition\":\"2010-10-11T17:51:52.204Z\"}},{\"relativePosition\":\"MetBy\",\"g:TimePeriod\":{\"g:id\":\"ID_190iv1hlow39r1c0gam6p8h02k\",\"g:begin\":{\"x:href\":\"ID_1gu4yx14on411od9yrrwbraia\",\"x:title\":\"D-day\"},\"g:end\":{\"nilReason\":\"Unknown\"},\"g:duration\":\"PT0S\"}}],\"g:beginPosition\":{\"indeterminatePosition\":\"unknown\"},\"g:endPosition\":{\"indeterminatePosition\":\"unknown\"}}}"
    };

Test:

    @Test
    public void testDiffPatch() {
        String s = JsonDiff.diff(more[0], more[1]);
        System.out.println(s);
        String t = JsonPatch.apply(more[0], s);
        assertTrue(more[1].equals(t));
        System.out.println(more[1]);
        System.out.println(t);
    }

Analysis: Some data is lost. Compare this part of more[1]:

    "g:relatedTime" : [ {
      "relativePosition" : "Contains",
      "g:TimeInstant" : {
        "g:id" : "ID_1gu4yx14on411od9yrrwbraia",
        "g:identifier" : {
          "codeSpace" : "JP1_02",
          "text" : "D-day"
        },
        "g:timePosition" : "2010-10-11T17:51:52.204Z"
      }
    },

to this part of the result:

    "g:relatedTime" : [ {
      "g:TimeInstant" : {
        "g:timePosition" : "2010-10-11T17:51:52.204Z"
      }
    },
algesten commented 12 years ago

Exciting! I'll take a look.

algesten commented 12 years ago

I've distilled the problem down to this simple test case.

    @Test
    public void testAdjustArrayMutationBoundariesWithObjectDeletion() {

        String from = "{\"a\":[{\"b\":{\"id\":\"id1\"}},{\"b\":{\"id\":\"id2\"}}]}"; 
        String to = "{\"a\":[{\"b\":{\"id\":\"id2\"}}]}";

        String d = JsonDiff.diff(from, to);

        String p = JsonPatch.apply(from, d);
        Assert.assertEquals(to, p);

    }

It's that classic diff algorithm problem you sometimes see in diff -u when you insert/remove a method and it doesn't know exactly what the addition/removal boundaries are.

From:

/* 
 * Doc for a
 */
void a () {
}
/* 
 * Doc for c
 */
void b () {
}

To:

/* 
 * Doc for a
 */
void a () {
}
/* 
 * Inserted method
 */
void c () {
}
/* 
 * Doc for c
 */
void b () {
}

Diff:

prince$ diff -u a c
--- a   2011-12-13 08:57:06.000000000 +0100
+++ c   2011-12-13 08:56:59.000000000 +0100
@@ -4,6 +4,11 @@
 void a () {
 }
 /* 
+ * Inserted method
+ */
+void c () {
+}
+/* 
  * Doc for c
  */
 void b () {

Notice the last added row +/*. The user obviously didn't think of it that way - and for this diff it makes no difference to the end result. But the way jsondiff uses the algorithm it does.

This is what I try to fix in adjustArrayMutationBoundaries() - but it's broken when there are nested deep objects.

I need to think on how to solve that.

DrLansing commented 12 years ago

If I convert your test case to XML and use jxydiff, it gives the right answer:

from = id1id2 to = id2 answer = DeleteNode: id1 path 0:0

Maybe because there are more or better boundaries?

DrLansing commented 12 years ago

Oops. That's: <wrapper><a><b><id>id1</id></b></a><a><b><id>id2</id></b></a></wrapper> <wrapper><a><b><id>id2</id></b></a></wrapper> DeleteNode: <a><b><id>id1</id></b></a> path 0:0

algesten commented 12 years ago

A quick update. I'm getting there. The test data you provided uncovered two bugs, one which is solved (the one described above), the other I'm wrestling with. It'll get there.

algesten commented 12 years ago

I haven't give up. It was just harder to find time to look at it. I've checked in my test cases (that are breaking atm). testModifyWhileDeletingPreviousElement1 and testModifyWhileDeletingPreviousElement2 distills the problem down to the core.

DrLansing commented 12 years ago

Looks good. Couldn't break it.

algesten commented 12 years ago

Good to hear!