Streaming mode - Githubissues

DrJosh9000 commented 6 months ago

The goal

Allow terminal output to be processed in a streaming manner, at the expense of not buffering the full history of the input. Enable much larger input to be processed without exhausting memory.

Why

The memory improvements of the last couple of PRs (#121, #124) whittle the memory allocated per run for the NPM benchmark from ~24MiB down to ~14MiB, but on a ~0.5MiB input that's still a hefty blowup.

The overwhelming amount of typical terminal output writes to new lines. Sometimes a command on a PTY will erase the previous line (e.g. git). Some tools will rewrite the last 10 or so lines (bazel, docker). In all cases, sensible CLI tools should assume the terminal window is no larger than a few hundred lines.

So why do we need to buffer more lines than that?

By streaming out all the lines we no longer need to touch, they don't need to be kept in memory, so they can be freed. The total memory needed to process input can be limited, rather than having to scale with the size of the input. This will enable displaying much larger logs.

How

The key functionality was added in #118; the main change is to tidy up the main edge case (preserving parser state between chunks of input) and default to using it. Pass --buffer-max-lines=0 to disable the line limit. It applies to all modes: stdin, file, and the web service.

But while I'm here...

Add some tests and benchmarks for the new streaming approach
Use strings.Builder instead of bytes.Buffer, and avoid a few casts to []byte - it's slightly faster
Change how the scroll-out works, to reuse previously allocated screen lines
Remove the strings.Replace that inserted   by generating   for every line that would have been empty.

Show me the benchmarks!

goos: darwin
goarch: arm64
pkg: github.com/buildkite/terminal-to-html/v3
                │ before.txt  │              after.txt              │
                │   sec/op    │   sec/op     vs base                │
RendererNpm-10    12.88m ± 1%   12.02m ± 0%   -6.64% (p=0.000 n=30)
StreamingNpm-10   13.64m ± 0%   11.31m ± 1%  -17.08% (p=0.000 n=30)
geomean           13.25m        11.66m       -12.02%

                │  before.txt   │              after.txt               │
                │     B/op      │     B/op      vs base                │
RendererNpm-10     13.92Mi ± 0%   11.19Mi ± 0%  -19.56% (p=0.000 n=30)
StreamingNpm-10   10.633Mi ± 0%   5.675Mi ± 0%  -46.62% (p=0.000 n=30)
geomean            12.16Mi        7.971Mi       -34.47%

                │ before.txt  │             after.txt              │
                │  allocs/op  │  allocs/op   vs base               │
RendererNpm-10    158.9k ± 0%   163.9k ± 0%  +3.14% (p=0.000 n=30)
StreamingNpm-10   158.9k ± 0%   158.2k ± 0%  -0.46% (p=0.000 n=30)
geomean           158.9k        161.0k       +1.32%

The streaming benchmark doesn't exist before this PR, so I copied it into main temporarily (and fixed it so it would build). The speedups are mostly to do with strings.Builder and avoiding casts to []byte. The 46% memory saving before/after in the streaming benchmark is primarily due to the screen line recycling.

dabarrell commented 5 months ago

Is my understanding that this would effectively limit the maximum memory used by t2html - meaning the limit is determined by the maxLines, not by the size of the input? Have you tried putting a bigger input file through the benchmarks to see what that looks like? e.g. 5mb compared to 0.5mb

DrJosh9000 commented 5 months ago

That is the goal, to limit the memory needed to process larger and larger input. In practice it will still allocate quite a lot, but should garbage collect more of it.

Here's the same benchmark on a 17MiB file (several copies of fixtures/npm.sh.raw):

BenchmarkRendererBig-10                3         356336375 ns/op        392413698 B/op   5244420 allocs/op
BenchmarkStreamingBig-10               4         322766854 ns/op        166112616 B/op   5042078 allocs/op

166 MB of memory allocations per run still sounds alarming, but that number is "total" - it doesn't decrease as the garbage collector frees memory. If I stick a little diagnostic code at the end of parseToScreen to see what's up:

    var m runtime.MemStats
    runtime.ReadMemStats(&m)
    log.Printf("TotalAlloc: %d\tHeapAlloc: %d\tHeapInuse: %d", m.TotalAlloc, m.HeapAlloc, m.HeapInuse)

and do a single run with the input streamed into the parser (io.Copy):

2024/01/18 11:26:16 TotalAlloc: 839816  HeapAlloc: 839816       HeapInuse: 1441792
2024/01/18 11:26:16 TotalAlloc: 1212656 HeapAlloc: 1212656      HeapInuse: 1761280
2024/01/18 11:26:16 TotalAlloc: 1558336 HeapAlloc: 1558336      HeapInuse: 2097152
...snip...
2024/01/18 11:26:17 TotalAlloc: 173126152       HeapAlloc: 1782400      HeapInuse: 2441216
2024/01/18 11:26:17 TotalAlloc: 173426456       HeapAlloc: 2082704      HeapInuse: 2727936
2024/01/18 11:26:17 TotalAlloc: 173618552       HeapAlloc: 2274800      HeapInuse: 2924544

HeapInuse is the most useful, and hovered ± 1.7 MB around a mean of 3.1 MB.

The same thing without using MaxLines:

2024/01/18 11:34:43 TotalAlloc: 836776  HeapAlloc: 836776       HeapInuse: 1359872
2024/01/18 11:34:43 TotalAlloc: 1222880 HeapAlloc: 1222880      HeapInuse: 1744896
2024/01/18 11:34:43 TotalAlloc: 1527456 HeapAlloc: 1527456      HeapInuse: 2072576
...snip...
2024/01/18 11:34:43 TotalAlloc: 248025120       HeapAlloc: 154157880    HeapInuse: 166273024
2024/01/18 11:34:43 TotalAlloc: 248375520       HeapAlloc: 154508280    HeapInuse: 166453248
2024/01/18 11:34:43 TotalAlloc: 248631216       HeapAlloc: 154763976    HeapInuse: 166600704

i.e. heap usage grows, does not shrink, and in the end has used 166 MB without being able to free it.

buildkite / terminal-to-html

Streaming mode #126

The goal

Why

How

But while I'm here...

Show me the benchmarks!