alacritty / vte

Parser for virtual terminal emulators
https://docs.rs/vte/
Apache License 2.0
242 stars 56 forks source link

Decide on the correct way of handling excess control sequence parameters #24

Closed VojtechStep closed 4 years ago

VojtechStep commented 5 years ago

In the current state, a control sequence that is supplied more than the maximum number of parameters weirdly smashes the trailing ones into a big number, behaving as if the parameter separators didn't exist. For example, the control sequence \e[38;2;0;255;0;48;2;255;0;0;1;2;3;4;5;6;0mHello gets parsed as [38, 2, 0, 255, 0, 48, 2, 255, 0, 0, 1, 2, 3, 4, 5, 60] - notice the 60 at the end.

AFAIK, the standard doesn't define any maximum number of parameters. Since I couldn't find any canonical way handle the excess of parameters, this is the behavior of other terminal emulators (data collected mostly by trying it on my machine and browsing code):

(Couldn't figure out what xterm does, seems to me like it just craps its pants and doesn't give a damn)

I'm inclined to implement the first option, but I don't have a strong opinion on it. The current way just doesn't seem "correct" in any shape or form and the VTE crate is the only one where I've seen this behavior so far.

chrisduerr commented 5 years ago

I'm not sure why it is the way it is right now, but if it brings any kind of performance benefits, I'd be inclined to keep it this way. Since it seems to be undefined behavior, we have the option to optimize things here.

VojtechStep commented 5 years ago

I don't think there are any performance gains from leaving it the way it is. Now if we get to the last parameter, we are still parsing the semicolons and numbers. We could instead set a flag that all the parameter slots are filled and do nothing until the final byte.

VojtechStep commented 5 years ago

@chrisduerr I wrote a simple benchmark to compare the performance of the current version and my implementation which ignores the excess parameters. If my methodology is correct, here are the results:

Current version:
test bench::excess_parameters ... bench:   4,018,670 ns/iter (+/- 4,489)
test bench::normal_parameters ... bench:       6,055 ns/iter (+/- 198)

New version:
test bench::excess_parameters ... bench:   4,019,946 ns/iter (+/- 8,280)
test bench::normal_parameters ... bench:       6,172 ns/iter (+/- 44)
nixpulvis commented 5 years ago

@VojtechStep I'm not sure how much value that benchmark is. Advancing 10,000 values should be about 1,000 times more expensive than advancing 10 values.

I believe @chrisduerr is optimistic that a solution to the problem you describe of squashing the last values together could guard against pathological parses of very long sequences of parameters. I'd frankly hope most applications don't provide such malicious sequences, but it's always good to perform well no matter what an application does. Are there any valid sequences that are anywhere near this long?

It sounds to me like the bigger question is what functionality we want to go with on this. I'm torn between the first two options:

  • Trash the entire control sequence - st, kitty, GNOME VTE (kitty seems to then print the sequence as normal printable characters)
  • Trash the excess parameters - apparently DEC VT100+ compatible emulators do this

I think I'm also in favor of the first option (with printing the sequence), since it makes the underlying issue as obvious as possible. Do we know of any applications that this change would break?

VojtechStep commented 5 years ago

@nixpulvis the idea was to test the parse time for both an invalid sequence (10000 parameters) and a valid sequence (10 parameters), to see whether there is any regression in normal usage.

And yes, my objective for this issue was to discuss what the correct behavior should be 👍.

I can't say I agree that the invalid sequence should be treated as printable characters though, as they are clearly not meant to be printed.

It is true however that we have no other way of logging it that I know of, because the crate is no_std, so we don't have access to any of the print functions.

I don't know of any applications that actively use control sequences this long, but from looking at the code of the emulators I included in the comparison, GNOME vte has the smallest number of max parameters and it's still double the amount we can handle, so I wouldn't be that surprised if it ever came up.

Still, if anyone was using long control sequences, it's not like they could work with the current version.

chrisduerr commented 5 years ago

I can't say I agree that the invalid sequence should be treated as printable characters though, as they are clearly not meant to be printed.

I think I agree with this point. Printing escape sequences that we are very certain are actual escape sequences (just too long) makes little sense to me. I don't really see how that could ever benefit the user.

nixpulvis commented 5 years ago

@chrisduerr yea, thinking about this with a fresh head, we should probably just log the error (when possible) and move on. I just like the idea that someone could detect the issue instead of completely silently failing. I know in Alacritty we can log errors, so I'm assuming there's a way to make this work, though we may need to integrate a logging crate or something IDK exactly right now.

VojtechStep commented 5 years ago

Yeah, since alacritty already depends on env_logger, we only need to add the parent log as a dependency.

VojtechStep commented 5 years ago

And because that is also no_std, this should not be a breaking change.