edbee / edbee-lib

QWidget based Text Editor Component for Qt. Multi-caret, Textmate grammar and highlighting support.
Other
75 stars 26 forks source link

Option to show Unicode BIDI characters (CVE-2021-42574) #127

Closed vadi2 closed 3 years ago

vadi2 commented 3 years ago

See https://www.trojansource.codes, it is possible to sneak in Unicode control characters which re-order how source code is understood - and if the editor doesn't show them (none did by default), bad things can happen.

How VScode solved it: https://code.visualstudio.com/updates/v1_62#_unicode-directional-formatting-characters

gamecreature commented 3 years ago

Source code sample

https://github.com/microsoft/vscode/pull/136347/commits/d2c24cc1d1161193c46ac44364a053c082657d69

Finding these character isn't very hard. I to figure out how to renderer it. The render pipeline uses QTextLayout to render this stuff, I don't know if I can render several characters as one character.

Btw I tried the 'bad' code in Edbee.. The cursor behaves very strange when move past this code. (It move correctly from right to left)

Sample bad code:

//              from, to, amount
transferBalance(5678,‮6776,4321‬,"USD");
gamecreature commented 3 years ago

Work in progress...

image

This look good, but isn't working. The caret positioning doesn't respect the [U+202e] texts as single character. I don't know if this can be resolved easily ..

vadi2 commented 3 years ago

Does it treat it as multiple characters?

gamecreature commented 3 years ago

Yes, that's the problem for now.. It is treated as multiple characters, but editing works with the original characters, so it behaves very strange.

An alternative for now, could be showing a single error character... (Unfortunately you don't see the unicode this way)

image

vadi2 commented 3 years ago

I think that is a fine alternative 👍

gamecreature commented 3 years ago

@vadi2 Can I enable this feature by default? (I think you don't want to disable this)

Note to self, for VSCODE style rendering:

At the moment the x-position of the caret is calculated with a line from QTextLayout and QTextLine::cursorToX(column). It is only used in source files: It TextRenderer.cpp and TextEditorRenderer.cpp

Because the QTextLine contains the full text [U+XXXX] it will calculate the wrong x-position. Perhaps it is possible to override the QTextLine to offset column location supplied when calculating the x.

Other note to self:

Research QTextLayout.setPreeditArea? Is it usable to add pieces text that the cursor skips?

vadi2 commented 3 years ago

I agree, it should be enabled by default.

gamecreature commented 3 years ago

Btw the current solution isn't, perfect, it doesn't correctly update the other format-ranges. But I guess is sufficient for now, to improve security related issues

image

vadi2 commented 3 years ago

What do you mean by the other format ranges? Things like homoglyphs?

gamecreature commented 3 years ago

The syntax highlighting is shifted for the line with bidi-characters , because the control character is becoming visibile. (When you look at the example above. The 6 becomes white instead of the comma). This is probably is fixable, by adjusting the other FormatRanges supplied to the QTextLayout.

It's even worse, it also shifts the movement of the cursor!

vadi2 commented 3 years ago

Yeah, I see. Well we'll expose an option to turn it off should anyone actually be working with bidirectional text - which isn't a usecase we have in Mudlet.

vadi2 commented 3 years ago

Could you add it a as a configurable document option, so we can allow users to change this at runtime?

gamecreature commented 3 years ago

I just implement it with a TextEditorConfig::renderBidiContolCharacters (which defaults to true) Further adopted another character as warning symbol: ␥ (http://unicode.org/charts/PDF/U2400.pdf) This character doesn't influence the caret-movement en syntax coloring. (So it's much better)

Further this is how it looks with renderBidiContolCharacters = false

image

And this with renderBidiContolCharacters = true

image

Also the control characters are disabled, because they are replaced. So the order of the source-code remains.

vadi2 commented 3 years ago

Great, thanks!