chalk / slice-ansi

Slice a string with ANSI escape codes
MIT License
47 stars 21 forks source link

Magic 19 number? #39

Open Shakeskeyboarde opened 3 months ago

Shakeskeyboarde commented 3 months ago

When an ESC/CSI is found, a character is parsed in the next 19 bytes utf-16 code units: https://github.com/chalk/slice-ansi/blob/a083b9586b576cf098b8895239a9b4a11f23aba9/index.js#L51

Why 19?

sindresorhus commented 3 months ago

I assume it's the max ansi escape length.

https://github.com/chalk/slice-ansi/commit/400a6ca5c23db8e71bf62d9ebf6082796ce5a7c6

sindresorhus commented 3 months ago

But maybe @AlCalzone can clarify. It was added in https://github.com/chalk/slice-ansi/commit/29f76a451d64757b8b833c7e34b25f139000857c.

AlCalzone commented 3 months ago

It's supposed to be the max. escape sequence length, essentially ESC[38;2;{r};{g};{b}m - ESC is one char, r, g, b can expand to 3 each.

Shakeskeyboarde commented 3 months ago

Thanks for clarifying. However, this means that if somehow the code doesn't end in an m, it just captures 19 code units and marks them as an ansi code? Even though it doesn't match the code pattern that the 19 value is derived from? Shouldn't it be considered not a code?

AlCalzone commented 3 months ago

Hmm I think you have a point there.

Qix- commented 3 months ago

https://github.com/chalk/slice-ansi/blob/a083b9586b576cf098b8895239a9b4a11f23aba9/index.js#L54

Reading the entire function is important for context. This function bails if the maximum length doesn't have an m, which in all implementations will bail the control code even if it's malformed.

I don't see an issue with the code as-is. If you think there's an issue, I'd love to see a repro.

AlCalzone commented 3 months ago

Technically it consumes the entire string instead of bailing, which is probably not what @Shakeskeyboarde expects. And I think ignoring the potential escape sequence at the current location might make more sense.

Qix- commented 3 months ago

I'm not sure what you mean. It consumes the whole string (by reference...) and iterates up to 19 times over it when looking to parse the sequence.

What issue are you really getting at?

AlCalzone commented 3 months ago

https://github.com/chalk/slice-ansi/blob/a083b9586b576cf098b8895239a9b4a11f23aba9/index.js#L79

In the case mentioned above, parseAnsiCode returns a 19-character substring of the input. As a result, code is truthy and the next iteration starts 19 characters later. This means if there is an actual ansi code in the 18 chars following the escape, it will be skipped.

Qix- commented 3 months ago

Ah okay. I see the point now.

This is also very wrong. there's no max length. \x1b[0;0;0;0;0;0;0;0;0;0;0;0m is a valid render mode escape. So is \x1b[0;1;38;2;123;123;123;48;2;456;456;456m, etc.

This shouldn't be constraining itself to an arbitrary length.

Shakeskeyboarde commented 3 months ago

Ah okay. I see the point now.

This is also very wrong. there's no max length. \x1b[0;0;0;0;0;0;0;0;0;0;0;0m is a valid render mode escape. So is \x1b[0;1;38;2;123;123;123;48;2;456;456;456m, etc.

This shouldn't be constraining itself to an arbitrary length.

A better solution would be to walk string until a terminator is reached, or an invalid character is encountered. So, match the CSI, then a number, then ; + number repeating, until m. If an invalid character is encountered, either ignore the whole thing, continuing from the CSI+1 character, or throw out everything up to the invalid character, assuming the code is incomplete.

That brings up another point. This doesn't seem to match \x1b[m, which is a valid reset code, because it never finds the number it's expecting.

Qix- commented 3 months ago

That's also correct, yeah. Worth mentioning that EOF should be considered as "invalid" when trying to parse a code and whatever came before it, after and including the CSI, should still be considered an escape. It's the closest you can get to how emulators would parse it, given that they're done using state machines.