bytehide / TrojanSourceDetector4Dotnet

Small CLI tool to check if your .NET projects have been affected with the trojan source vulnerability.
20 stars 8 forks source link

Need to correctly identify surrogates for UTF16 emojis #2

Open sharpninja opened 2 years ago

sharpninja commented 2 years ago

Some code I was examining kept coming up with hits on \uD83D\uDCCC which it turns out is a surrogate to the pushpin emoji, which is totally valid to be in source code. Before I can use this and report findings from it we need a table of valid surrogates to exclude.

jespanag commented 2 years ago

Create a black list based on common emojis can be a valid solution ?

https://www.fileformat.info/info/emoji/list.htm

jespanag commented 2 years ago

@sharpninja look https://unicode.org/Public/emoji/13.1/emoji-test.txt, and https://regex101.com/r/mxJYXs/1

sharpninja commented 2 years ago

If there is a mathematical relationship the the definition of emojies that would make scanning faster.

Sent from ProtonMail mobile

-------- Original Message -------- On Nov 11, 2021, 1:48 PM, Juan España Garcia < @.***> wrote:

@.***sharpninja look https://unicode.org/Public/emoji/13.1/emoji-test.txt, and https://regex101.com/r/mxJYXs/1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.AD3GCLB4G2CS4XWPL6FYC7TULQM2TA5CNFSM5H274XYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHGONEFY.gif

sharpninja commented 2 years ago

https://unicode.org/Public/emoji/14.0/emoji-test.txt

This shows all the emojis, plus how to apply the modifiers to them. I may just strip it down to comma-separated lines to build a whitelist.

jespanag commented 2 years ago

I close as soon as it is in nuget