Issue:
it's possible to use non-ASCII characters in variables names, for example 限 = 1337
限 is \u9650
since this unicode character's most significant digit isn't D, \u9650 is split by the decompressor in two different ASCII characters 96 and 50 (as in \u0096 and \u0050 or escaped: %96 and %50).
While \u0050 is P, \u0096 is not a valid character, hence the error:
限 is a good character because when it's split into two different characters the first is invalid, but the second is a valid one! (P)
And most importantly the variable P was not defined along with 限 so when we compress it, there are no conflicts!
If you want to test it, try compressing d/24278 with a compressor like this one.
Then when you'll decompress it again (replacing eval with throw for example...) you'll see that 限 was split and only P was kept!
This PR solves this kind of edge cases, where the unicode variable name is replaced upon compression by a valid and unused ASCII character.
If that variable was called 阈 then we'd still have a problem, because that character (being \u9608) is split into \u0096 (ignored) and \u0008 (invalid)
An other example of error that I think can't be avoided can be tested by using 陸 (\u9678) as a variable name because the compression result would be x (\u0078) is already defined and used by dwitter (and while javascript allows it to be redefined it would probably lead to errors, for example if something like x.fillRect is used)
Issue: it's possible to use non-ASCII characters in variables names, for example
限 = 1337
限
is\u9650
since this unicode character's most significant digit isn't
D
,\u9650
is split by the decompressor in two different ASCII characters96
and50
(as in\u0096
and\u0050
or escaped:%96
and%50
).While
\u0050
isP
,\u0096
is not a valid character, hence the error:限
is a good character because when it's split into two different characters the first is invalid, but the second is a valid one! (P
) And most importantly the variableP
was not defined along with限
so when we compress it, there are no conflicts!If you want to test it, try compressing d/24278 with a compressor like this one. Then when you'll decompress it again (replacing
eval
withthrow
for example...) you'll see that限
was split and onlyP
was kept!This PR solves this kind of edge cases, where the unicode variable name is replaced upon compression by a valid and unused ASCII character.
If that variable was called
阈
then we'd still have a problem, because that character (being\u9608
) is split into\u0096
(ignored) and\u0008
(invalid)An other example of error that I think can't be avoided can be tested by using
陸
(\u9678
) as a variable name because the compression result would bex
(\u0078
) is already defined and used by dwitter (and while javascript allows it to be redefined it would probably lead to errors, for example if something likex.fillRect
is used)Both this compression errors can be tested with https://xem.github.io/obfuscatweet/ and code like this dweet's: https://dwitter.net/d/24278