glimmerjs / glimmer-vm

MIT License
1.13k stars 191 forks source link

[perf] replace Math.max usage with reduce and pure compare #1520

Closed lifeart closed 11 months ago

lifeart commented 11 months ago

It seems reduce is 5 times faster for our case https://jsben.ch/Qosu7

image

Here is V8 Math.Max implementation: https://github.com/v8/v8/blob/cd81dd6d740ff82a1abbc68615e8769bd467f91e/src/js/math.js#L77-L102 (old) https://github.com/v8/v8/blob/04f51bc70a38fbea743588e41290bea40830a486/src/builtins/math.tq#L135 (new)

image

It seems it uses more code paths than we actually need:

In our case both args already numbers and we could take 1 step instead of 5

If we assume every step is an opcode and there is no caching, for case where we have 10 tags + 10 subtags, new comparison should took 100 opcodes, and old (with Math.max) - 500. This numbers quite relative to numbers we see in micro-bench.


Likely need to check macro-bench to verify

lifeart commented 11 months ago

Here is a comparison results with https://github.com/glimmerjs/glimmer-vm/pull/1515#issuecomment-1841497398

image

Last image - math.max + all changes from linked PR.

I can't see dramatic improvements in performance, but I see difference in results accuracy here:

Pasted Graphic 1
NullVoxPopuli commented 11 months ago

Here is what I get on FireFox your bench image

me changing the input values image

NullVoxPopuli commented 11 months ago

thanks for running the benchmark and providing results / reasoning / research / etc! it helps a lot with understanding the why behind this change, rather than just the outcome

victor-homyakov commented 10 months ago

@lifeart @NullVoxPopuli please note that the benchmark https://jsben.ch/Qosu7 has some problems.

  1. Looks like "Code block 3" has an error in implementation:

    for (const tag of values) {
    result = Math.max(values[tag], result);
    //       ^ maybe Math.max(tag, result)
    }

    This may have some penalty in speed (actually negligible in my measurements). Anyways, you should pay more attention to the benchmarked code.

  2. The data under test is the series of increasing numbers from zero to 19 [0, 1, 2, ... 19, 0, 1, 2, ... 19, ...]. It is very predictable and may affect the benchmark results.

  3. The data creation is placed in the "boilerplate" part, which is part of the benchmark. Time for initialization of the array is added to every variant of code. Array initialization should be placed in the "setup" part, which isn't benchmarked.

  4. The way of array creation may affect performance. I.e. var values = []; ... values.push(...); (FixedArray with PACKED_SMI_ELEMENTS in V8) may behave differently from var values = new Array(5000); ... values[i] = ...; (FixedArray with HOLEY_SMI_ELEMENTS in V8). Benchmark should use the same creation method as in the original code.

Here is the improved benchmark: https://jsben.ch/eNWlG and its results:

Chrome 120 on Apple M1: image

Safari 17.2 on Apple M1: image

Firefox 122 on Apple M1: image

victor-homyakov commented 10 months ago

Results for Windows 10 64-bit on AMD Ryzen 7 5800H CPU:

Chrome 121 Benchmark Chrome 121 Windows

Firefox 122 Benchmark Firefox 122 Windows

NullVoxPopuli commented 10 months ago

seems like it's worth a PR, if you have the time! :tada: nice work!

victor-homyakov commented 10 months ago

I'd say that the classical for(;;) loop with if() is the best choice taking into account all the platforms it may run on.

victor-homyakov commented 9 months ago

seems like it's worth a PR, if you have the time! 🎉 nice work!

tsconfig has "noUncheckedIndexedAccess": true and to pass type checking the loop should look like

for (let i = 0; i < subtag.length; i++) {
  const value = subtag[i]![COMPUTE]();
  revision = Math.max(value, revision);
}

or

for (let i = 0; i < subtag.length; i++) {
  const tag = subtag[i];
  if (tag !== undefined) {
    const value = tag[COMPUTE]();
    revision = Math.max(value, revision);
  }
}

Which one do you prefer?

Fortunately, both variants show close results in benchmark https://jsben.ch/we4qL on current versions of Chrome, Firefox, and Safari.