Make block template synchronization smarter using a LCS algorithm

oxyc commented 4 years ago

Is your feature request related to a problem? Please describe.

If you build a more advanced site based on post templates with eg, products and recipes using various nested blocks it can become quite a challenge to bring in template updates. At the moment the synchronizeBlocksWithTemplate simply iterates through the blocks and if it reaches one that doesn't match the template, it ignores it, creates a new block and moves on. This works well with changes but not when blocks have been added or removed since all subsequent blocks will be ignored.

Describe the solution you'd like

At minimum it would be nice to use a Longest Common Subsequence algorithm which would solve the case of removals and additions. Just to see how it would work I used jsondiffpatch. It probably doesn't make sense to pull in a whole library, so a LCS diff and patch method could be coded with Gutenberg in mind.

See https://benjamine.github.io/jsondiffpatch/demo/index.html?oxyc/dbe5c86f0a73dec7f5fc7f14219a5981 for an example diff

With changed, removed and added blocks solved it leaves the problem of moved blocks. Eg if a block becomes grouped or ungrouped it could potentially still understand that the block does exist. Not sure what sort of algorithm would work for this though...

See https://benjamine.github.io/jsondiffpatch/demo/index.html?oxyc/7cabc4ece3dcd15383d27d92db4d4fd0 for an example diff of this problem

Thoughts? Could this be reused with the block collaboration feature? There's been discussions regarding more granular template locking for attributes, should this take into account that attributes might need to be synced in the future (in which case a third party package might make more sense)?

oxyc commented 4 years ago

Tested using https://github.com/WordPress/gutenberg/pull/23129/files#diff-a34e93c7fc85c3df865777e90a6d4181 and fixes removal/additions unless it happens both above and below a block, but that's already an improvement.

dmsnell commented 4 years ago

@oxyc are you proposing adding a new view for someone who is updating their templates which would show the changes, as in your linked demo?

oxyc commented 4 years ago

No, I'm proposing a smarter sync algorithm between the content blocks and a template. See https://cloudup.com/cwYsGHzXH5i for a before and after of what a Longest Common Subsequence algorithm does in a demo branch.

Since this is still pretty basic merging and performance doesn't really matter I bet something like a recursive variant of what @epiqueras uses here https://github.com/WordPress/gutenberg/pull/23129/files#diff-a34e93c7fc85c3df865777e90a6d4181 would do to produce what the demo does using jsondiffpatch. The current version of the simpleDiff in the Block collaboration PR would handle everything except for the center column where it would lose the paragraph. Making it recursive and searching from the end as well would fix it I guess.

Ideally it could be even smarter, taking into account how gutenberg blocks work (eg. grouping, ungrouping), but that might make the solution overly complex and at least I cant come up with anything.

dmsnell commented 4 years ago

Can you describe the algorithm which you think would work best for this?

I'm very familiar with diff-match-patch libraries and json-diffing but I'm less aware of how this directly applies to resolving template differences. It seems like many situations could have non-trivial resolution, unless we're still talking about wiping out everything that isn't first-in-line and still-there in the new template.

oxyc commented 4 years ago

Great, because I'm not familiar so maybe you can guide me a bit :)

The current synchronization is based on indexes in the block list and then matching based on block names. I'd say transformations are out of scope as it would cause errors and confusion. Therefore matching based on block names is sufficient but the block list index matching is where sync should be improved.

Rather than matching blocks based on indexes I'd say it needs to find the fewest number of changes to make blocks match the template. Reading about Myers Diff algorithm it sounds like this would be called generating the Shortest Edit Script (SES) by finding the Longest Common Subsequences (LCS). If the synchronization did this (with whatever algorithms) we'd ensure as much of the previous content was kept as possible, and therefore keeping the amount of manual changes at a minimum. Even if some content didn't move entirely correct, the content is there and can be copied, rather than wiped and retrieved from post revisions.

@dmsnell do you agree with this conclusion, or am I off track?

I'm not sure which algorithms would solve this in the best way. How jsonpatchdiff does it works pretty great in my opinion. But it could be simplified since our use case is also much simpler. Obviously we're not doing json diff-patch, I just used the library to make a POC of how it could work if made smarter.

dmsnell commented 4 years ago

@oxyc this makes more sense to me than originally when I read the issue. I think you are saying that when resolving templates we can adopt all of the blocks that the diffing algorithm thinks are equal between the versions, delete the now-extraneous ones from the content, and add in the defaults that are added in the update.

Reading about Myers Diff algorithm it sounds like this would be called generating the Shortest Edit Script (SES) by finding the Longest Common Subsequences (LCS)

To my knowledge most diffing libraries don't implement this exactly because the shortest-edit-script isn't usually as "human" friendly as other edit scripts which are less efficient.

If you are interested more you can read a helpful summary of diffing, written by Neil Fraser who wrote the diff-match-patch library.

are you interested at all in trying to create a patch to propose this? do you feel like you might be able to pull that off?

oxyc commented 4 years ago

this makes more sense to me than originally when I read the issue. I think you are saying that when resolving templates we can adopt all of the blocks that the diffing algorithm thinks are equal between the versions, delete the now-extraneous ones from the content, and add in the defaults that are added in the update.

I apologize for that, I'll try to clarify it in the issue summary later today.

are you interested at all in trying to create a patch to propose this? do you feel like you might be able to pull that off?

I think it sounds like a fun problem to solve–unless someone thinks this is the wrong approach, that it's not needed, or that the Block Collaboration feature will anyways provide an improved diff-match-patch functionality. So yeah unless there are objections, I'd be interested in working on it and sending a PR once I eventually have something that I think is acceptable (no timeline on that). As this hasn't been brought up before I guess it's not a priority but at least a welcome enhancement.

WordPress / gutenberg

Make block template synchronization smarter using a LCS algorithm #23345