Axosoft / diff3

10 stars 5 forks source link

Back-port memory improvements #1

Closed gitkraken-jacobw closed 2 years ago

gitkraken-jacobw commented 2 years ago

This PR is a little weird, mostly because onp.js is ancient JavaScript that doesn't read too well. To keep this PR within scope, I made my changes while trying to maintain the spirit of what was already there while still affording myself some improvements to help my sanity (for instance, variable names that are a little more descriptive than their one-letter counterparts).


Since this is a make or break for our merge conflict editor, I'm going to explain the logic behind my changes. onp is a two-way diff algorithm that we like for its theoretical performance. onp finds the shortest edit script between two input texts. onp is implemented a little weird because its basic implementation does not keep track of state, so we have to tack some state of our own onto it in order to return the shortest edit script at the end. The tracking of this state is what was wrong with the old version of onp.js, since it would use too much memory and crash. Anyways, back to the shortest edit script.

Essentially, a shortest edit script is a traversal of a line joining the two corners an edit graph: Screen Shot 2022-07-01 at 3 12 47 PM The old onp.js stored diagonal lines efficiently, while storing horizontal/vertical lines inefficiently. A diagonal line represented a path that spans multiple coordinates, so a line from (1, 1) to (5, 5) is represented with only one object. Vertical/horizontal lines needed an object for each coordinate, so around 5 objects for a line spanning (1, 1) to (1, 5). I realized that we only need to store diagonal lines, since the algorithm searches predictably enough to infer the entire shortest edit script from diagonal lines alone (changes to lines 81-87 in the new file). All lines are connected together in a linked list (the r property in the object P is an integer that indexes into the array pathposi: P[], which represents the object's parent), and I made it so that old diagonal lines in state could be looked up again and connected to new diagonal lines, forgoing the need to store horizontal/vertical lines (previously a series of horizontal/vertical lines would be created to connect two diagonal lines together. I simply cut out the middleman).

Additionally, the output format that we need for the diff3 portion of this library is not the full shortest edit script, so it turns out we don't even have to do any of the aforementioned inference to traverse the shortest edit script to begin with (if you look at the old version of onp.js, you will notice that they included an interpolator that iterates through the entire shortest edit script, but this code is not necessary. you can see that i replaced the old interpolator code with something else at lines 120-149 in the new file).

I did regression testing using the old diff3 as the source of truth and I wasn't able to find any failed regression tests. My regression testing was assembled from some shell scripts since I didn't really know how I would go about implementing it for JS, so let me know if you want to see my shell scripts ported over to JS.

gitkraken-jacobw commented 2 years ago

@Mr-Wallet

Mr-Wallet commented 2 years ago

Really I don't know how to efficiently review this... Right now my thinking is just to merge and publish so we can open a PR to gitkraken immediately, then play with it a little there and see if anything crops up. However long it took you to understand the problem well enough to see there was a JS-only optimization available, is about how long a reviewer would have to spend to understand everything properly, and we don't have that kind of time. đŸ¤·

I followed along with the description as best I could, and thanks for that, but that's a far cry from independently proving the correctness of the code. I'm just gonna take it on faith you did sufficient regression testing.