buildkite / terminal-to-html

Converts arbitrary shell output (with ANSI) into beautifully rendered HTML
http://buildkite.github.io/terminal-to-html
MIT License
642 stars 45 forks source link

Bug with rendering characters in Gradle progress output #96

Closed lox closed 1 year ago

lox commented 1 year ago

We are using the Agent ansi-timestamps experiment, which injects timestamps into the ANSI output using Application Program Commands. These should be rendered as invisible, much like control characters. It appears this isn't happening:

image

When rendered directly in iTerm2, we see the correct output (even with the ansi timestamps inline):

image

I suspect this will be affecting lots of different types of progress bars when ansi-timestamps is enabled.

I suspect this is because the handling of the APCs is appending an element to the virtual screen, which will then increase the X count, meaning it throws out any ANSI control characters that are clearing the progress bar part way through with backspace control chars (vs erasing the whole line).

lox commented 1 year ago

Test case: raw_ansi_output.log

pda commented 1 year ago

Ooh interesting 👍🏼

pda commented 1 year ago

I have a simplified reproduction based on that 👌🏼

You're on the money, the output is using CSI n D / CUB / Cursor Back by n characters to overwrite the progress bar, and each _bk;ts=… timestamp throughout the line is consuming one of those CUBs.

I'll try to turn that into a test case and fix.

pda commented 1 year ago

I wonder if this is also a bug in the agent's timestamping.

The timestamps in your raw_ansi_output.log appear in patterns roughly like this:

Hello <ESC>[0K <ESC>_bk;ts=… <ESC>[5D <ESC>[1B <ESC>[1A

Which means:

… That's not where we should be putting a timestamp. The <ESC>[0K is basically saying “I've overwritten part of this line, now ensure everything after the cursor is blanked, in case the old line was longer”. So we're not at the start of a line.

It would be nice if it didn't cause rendering bugs, but that's a separate issue — even if resolved, it would leave out of order timestamps etc.

pda commented 1 year ago

In the agent a log Prefixer is used to prefix lines with ANSI timestamp, and here's it's detection of escape codes that indicate a new line:

// This regex matches "Erase in Line" escape characters:
// [K Clear from cursor to the end of the line
// [1K Clear from cursor to beginning of the line
// [2K Clear entire line
var lineBreakEscapeCharRegex = regexp.MustCompile(`^\[[012]?K`)

We can see [K Clear from cursor to the end of the line is included intentionally, and also the equivalent [0K is handled by the regexp character class.

Here's the Wikipedia description of [nK:

Clears part of the screen. If n is 0 (or missing), clear from cursor to end of screen. If n is 1, clear from cursor to beginning of the screen. If n is 2, clear entire screen (and moves cursor to upper left on DOS ANSI.SYS). If n is 3, clear entire screen and delete all lines saved in the scrollback buffer (this feature was added for xterm and is supported by other terminal applications).

I imagine we want timestamps in the subset of cases where one of those clears the entire line; including perhaps [0K when the cursor is at the start of the line but not when it's mid-line. But maybe just not at all.

Fortunately ansi-timestamps is still in experiment status, so I think we have a bit more freedom to change this behaviour. /cc @moskyb