Handle long inputs, normalized output, empirical results.

o-alexandre-felipe commented 3 years ago

Size of the input

The previous version was using Uint8Array to store the indices. This was causing the algorithm to break for inputs longer than 256.

So we would need Uint16 for inputs in the range from 256 to 65535, and int32 for inputs longer than this. On the other hand I did not notice too much time difference for the small inputs. If this is to be optimized for short inputs what I would suggest is to have a buffer for inputs up to 256 pre allocated (since it is less than 0.5kB RAM). But the increase of the complexity would not pay off in my opinion, what I am doing is to use always Int32.

Output normalization

In the package documentation it is stated that the differences are is an array of blocks [xs, xe, ys, ye] representing the aligned mismatched segments. Consider the comparison of "a" and "b", it can be [[0,0,0,1],[0,1,1,1]], [[0,1,0,0], [1,1,0,1]], but both of these have two consecutive contiguous segments that can be merged in [[0,1,0,1]] this output is shorter negligible benefit from the library point of view, but may the consumer of the API (like me).

Critical analysis

When optimizing an high level language (and for modern CPUs) for speed we may end up with some unexpected results. For instance, using bit shifts to represent multiplications is not a good practice usually make javascript slower than multiplication. I verified this on the benchmark (you can check in your environment to see if there is any killer improvement, to me it is worse). I replaced some bit shifts with multiplications. In special there is one definition for hmax (L >> 2) + parity, that can be defined as (L + parity) / 2 and two asignments Z = (Math.min(N, M) + 1) * 2 that can be unified at the cost of only one redundant evaluation per call.

Empirical results

The theoretical background you present is great and it is very convincing if the reader goes deep on the analysis. The icing on the cake is to have a way to measure that. I added the library myers-diff and it is benchmarking some of the results, to see in the practice how much improvement it gives.

o-alexandre-felipe commented 3 years ago

@gliese1337, have you seen this?

gliese1337 commented 3 years ago

I had not! Thanks for the comment. I'll take a look!

gliese1337 commented 3 years ago

OK, new version has been published. A few notes:

On the size of the input: I had intended to automatically adjust the buffer type based on the input size at some point; this spurred me to actually do so. So, it now dynamically picks between 8-bit, 16-bit, and 32-bit integers, rather than having the largest size hard-coded.

On output normalization: this was included as a post-processing step in the original version, but removed for speed and to make correctness easier to verify. Your implementation is an improvement over what I had to start with, so thanks!

two asignments Z = (Math.min(N, M) + 1) * 2 that can be unified at the cost of only one redundant evaluation per call.

This is a micro-optimization, but the cost is actually greater than one redundant evaluation per call; this is because there is in fact a code path which results in continuing the loop without re-assigning N or M, and thus not requiring any recomputation of Z. It's not super significant, but it does happen.

And the empirical measurements are nice, so thanks again!

gliese1337 / fast-myers-diff