improve performance (+99.5% throughput)

AlCalzone commented 1 year ago

As mentioned in https://github.com/vadimdemedes/ink/issues/560, I'm having severe performance issues with this library. This PR is an attempt at making it faster.

I suggest reviewing the changes commit-by-commit, while reading the corresponding change description and benchmark. The last commit is a complete rewrite though.

I'm testing the impact of each change using the following benchmark:

import b from "benny";
import sliceAnsi from "./original.js";
// import optimized variants

// This might seem ridiculous, but it's from the test case I posted over at the `ink` issue.
const line = "\x1B[48;2;255;255;255m│\x1B[49m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m\x1B[37m╚═╝\x1B[39m                                                                                                                                                                             \x1B[34m│\x1B[39m";

const add = (name, fn) =>
    b.add(name, fn, {
        initCount: 10000,
    });

b.suite(
    "sliceAnsi",
    add("original", () => {
        sliceAnsi(line, 0, 82);
        sliceAnsi(line, 82, 83);
        sliceAnsi(line, 83);
    }),
    // test optimized variants

    b.cycle(),
    b.complete()
);

The results are varying between tests a bit, but I'll try to post comparable benchmarks for each change.

First change: Use a simple for loop instead of calling `entries()` on the character array

Result:

  original:
    21 976 ops/s, ±6.14%   | slowest, 10.01% slower

  for loop:
    24 420 ops/s, ±2.74%   | fastest

AlCalzone commented 1 year ago

Second change: Use a string search instead of RegExp to match ansi codes

Result:

  original:
    23 692 ops/s, ±0.73%   | slowest, 17.61% slower

  for loop:
    27 710 ops/s, ±0.94%   | 3.64% slower

  for-loop + no regex:
    28 756 ops/s, ±0.58%   | fastest

AlCalzone commented 1 year ago

Third change: Don't spread the string into an array, loop through codepoints directly

Result:

  original:
    24 256 ops/s, ±2.36%   | slowest, 41.81% slower

  for loop:
    28 719 ops/s, ±2.07%   | 31.1% slower

  for-loop + no regex:
    30 076 ops/s, ±1.17%   | 27.85% slower

  no regex + no spread:
    41 685 ops/s, ±0.89%   | fastest

AlCalzone commented 1 year ago

4th change: Avoid converting ansi codes to number for end code lookup

Result:

  original:
    27 402 ops/s, ±0.76%   | slowest, 41.57% slower

  (snip)

  no regex + no spread:
    43 911 ops/s, ±0.81%   | 6.36% slower

  no conversion between number and string:
    46 895 ops/s, ±0.98%   | fastest

AlCalzone commented 1 year ago

5th change: Don't use string.split to access first char before first ";", use indexOf and slice

Result:

  original:
    22 828 ops/s, ±1.84%   | slowest, 49.89% slower

  (snip)

  no conversion between number and string:
    42 327 ops/s, ±0.90%   | 7.08% slower

  indexOf instead of splitting:
    45 553 ops/s, ±0.90%   | fastest

AlCalzone commented 1 year ago

Last change: "Tokenize" input into ansi codes and characters, operate on pre-analyzed array

This one is much more involved. I basically rewrote everything to first analyze the input string and turn it into an array of ANSI codes and characters. Slicing is then done by operating on the array.

I had to change two tests though - one unnecessarily used two separate foreground colors where the 2nd overwrote the 1st. And in another one, the expected string had the end codes in the same order as the start codes, while all others had it in the opposite order. I didn't see any difference when printing those strings.

This change has a lot of potential though (+149% throughput):

  original:
    20 822 ops/s, ±1.70%

  (snip)

  indexOf instead of splitting:
    41 278 ops/s, ±4.02%

  tokenize:
    51 791 ops/s, ±2.93%

and if the result of the tokenization is memorized (export tokenize function, and add a copy of the sliceAnsi function which operates on a token array, instead of a string) , repeated slices on the same input (like in the test case) are much faster (+300% throughput):

  tokenize, the smart way:
    84 850 ops/s, ±3.21%

FYI, I have TypeScript code for this change. Let me know if that is preferred.

vadimdemedes commented 1 year ago

I think it'd be a lot easier for maintainers to review this PR and much more likely to merge your changes, if it was split up into much smaller PRs, focused on making each individual performance optimization. Right now it's hard to understand, because the whole library basically get rewritten, which would be a tough sell.

AlCalzone commented 1 year ago

Guess I can rip out the last two commits and put them up as an alternative. The others consecutively build on each other and sometimes undo what a previous one did, so splitting them into individual PRs doesn't make much sense to me.

Edit: done

chalk / slice-ansi