cleanWithoutTrim creates in-memory copies of the contents of every ast.Text in a document. This causes a fair bit of unnecessary allocation.

This replaces cleanWithoutTrim with a streaming writer that implements the equivalent behavior but without copying byte slices in-memory. Instead, it writes directly to the destination Writer.

To verify parity of the implementation, this retains cleanWithoutTrim in a test file, fuzzes both implementations together, and compares their outputs. You can checkout this PR and try it out yourself:

go test -run '^$' -fuzz . -v github.com/Kunde21/markdownfmt/v2/markdown

Results

This PR also adds a benchmark that measures the cost of rendering all input files inside testfiles/.

The benchstat before/after this change is:

name                                   old time/op    new time/op    delta
Render/example1.input.md-2               34.4µs ± 2%    31.7µs ± 1%   -7.90%  (p=0.008 n=5+5)
Render/headers.same.md-2                 37.0µs ± 2%    40.0µs ±20%     ~     (p=0.690 n=5+5)
Render/html.input.md-2                   10.7µs ± 1%    12.9µs ±19%     ~     (p=0.151 n=5+5)
Render/lists.input.md-2                   137µs ± 3%     129µs ± 3%   -5.56%  (p=0.008 n=5+5)
Render/lists.same.md-2                   47.0µs ± 1%    42.7µs ± 0%   -9.08%  (p=0.008 n=5+5)
Render/nested-code.same.md-2             3.03µs ± 1%    2.79µs ± 3%   -7.89%  (p=0.008 n=5+5)
Render/reference.same.md-2                189µs ± 2%     170µs ± 4%  -10.01%  (p=0.008 n=5+5)
Render/successive.input.md-2             9.43µs ± 3%    7.53µs ± 1%  -20.15%  (p=0.008 n=5+5)
Render/things-inside-blocks.same.md-2     127µs ± 4%     112µs ± 1%  -11.48%  (p=0.008 n=5+5)
Render/widechar.input.md-2               4.69µs ± 6%    3.89µs ± 2%  -17.02%  (p=0.008 n=5+5)

name                                   old alloc/op   new alloc/op   delta
Render/example1.input.md-2               4.39kB ± 0%    3.19kB ± 0%  -27.32%  (p=0.008 n=5+5)
Render/headers.same.md-2                 9.27kB ± 0%    8.72kB ± 0%   -5.88%  (p=0.008 n=5+5)
Render/html.input.md-2                   2.15kB ± 0%    2.13kB ± 0%   -0.84%  (p=0.008 n=5+5)
Render/lists.input.md-2                  6.00kB ± 0%    4.18kB ± 0%  -30.41%  (p=0.008 n=5+5)
Render/lists.same.md-2                   2.26kB ± 0%    1.34kB ± 0%  -40.64%  (p=0.008 n=5+5)
Render/nested-code.same.md-2               432B ± 0%      288B ± 0%  -33.33%  (p=0.008 n=5+5)
Render/reference.same.md-2               25.6kB ± 0%    20.7kB ± 0%  -19.16%  (p=0.008 n=5+5)
Render/successive.input.md-2               520B ± 0%      328B ± 0%  -36.92%  (p=0.008 n=5+5)
Render/things-inside-blocks.same.md-2    18.3kB ± 0%    16.3kB ± 0%  -11.30%  (p=0.008 n=5+5)
Render/widechar.input.md-2               1.24kB ± 0%    1.12kB ± 0%   -9.84%  (p=0.008 n=5+5)

name                                   old allocs/op  new allocs/op  delta
Render/example1.input.md-2                  152 ± 0%        95 ± 0%  -37.50%  (p=0.008 n=5+5)
Render/headers.same.md-2                    217 ± 0%       180 ± 0%  -17.05%  (p=0.008 n=5+5)
Render/html.input.md-2                     30.0 ± 0%      28.0 ± 0%   -6.67%  (p=0.008 n=5+5)
Render/lists.input.md-2                     749 ± 0%       618 ± 0%  -17.49%  (p=0.008 n=5+5)
Render/lists.same.md-2                      264 ± 0%       196 ± 0%  -25.76%  (p=0.008 n=5+5)
Render/nested-code.same.md-2               14.0 ± 0%       8.0 ± 0%  -42.86%  (p=0.008 n=5+5)
Render/reference.same.md-2                  932 ± 0%       611 ± 0%  -34.44%  (p=0.008 n=5+5)
Render/successive.input.md-2               39.0 ± 0%      18.0 ± 0%  -53.85%  (p=0.008 n=5+5)
Render/things-inside-blocks.same.md-2       658 ± 0%       493 ± 0%  -25.08%  (p=0.008 n=5+5)
Render/widechar.input.md-2                 29.0 ± 0%      21.0 ± 0%  -27.59%  (p=0.008 n=5+5)

Allocations are down, and performance is up across the board. Note that I'm running these benchmarks on a pretty low-end machine, so the CPU time is higher than it would normally be:

goos: linux
goarch: amd64
pkg: github.com/Kunde21/markdownfmt/v2/markdownfmt
cpu: Intel(R) Celeron(R) N4020 CPU @ 1.10GHz

To run the benchmark locally, checkout this PR and run:

IMPORTPATH=github.com/Kunde21/markdownfmt/v2/markdownfmt
git checkout HEAD~ &&
  go test -run '^$' -bench . -v -benchmem -count 5 $IMPORTPATH | tee before.txt &&
  git checkout - &&
  go test -run '^$' -bench . -v -benchmem -count 5 $IMPORTPATH | tee after.txt &&
  benchstat before.txt after.txt

You'll need to install benchstat to generate the final result first:

go install golang.org/x/perf/cmd/benchstat@latest

Kunde21 / markdownfmt

perf: Don't copy contents of ast.Text #53

Results