chakra-core / ChakraCore

ChakraCore is an open source Javascript engine with a C API.
MIT License
9.13k stars 1.2k forks source link

Unicode: Many characters incorrectly treated as whitespace #3050

Open hackvertor opened 7 years ago

hackvertor commented 7 years ago
ັັັalert(ັັັ'LOL Edge'ັັັ)ັັັ 

Expected: Invalid token Actual: alert function is called.

This behaviour only seems to happen before or after an identifier or object. If you place the characters inside identifier they will be treated differently.

Test case:

eval(String.fromCharCode(1468)+'alert(1)'+String.fromCharCode(1468));

Many more characters exhibit this behaviour 768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,1155,1156,1157,1158,1159,1425,1426,1427,1428,1429,1430,1431,1432,1433,1434,1435,1436,1437,1438,1439,1440,1441,1442,1443,1444,1445,1446,1447,1448,1449,1450,1451,1452,1453,1454,1455,1456,1457,1458,1459,1460,1461,1462,1463,1464,1465,1466,1467,1468,1469,1471,1473,1474,1476,1477,1479,1552,1553,1554,1555,1556,1557,1558,1559,1560,1561,1562,1611,1612,1613,1614,1615,1616,1617,1618,1619,1620,1621,1622,1623,1624,1625,1626,1627,1628,1629,1630,1631,1648,1750,1751,1752,1753,1754,1755,1756,1759,1760,1761,1762,1763,1764,1767,1768,1770,1771,1772,1773,1809,1840,1841,1842,1843,1844,1845,1846,1847,1848,1849,1850,1851,1852,1853,1854,1855,1856,1857,1858,1859,1860,1861,1862,1863,1864,1865,1866,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,2027,2028,2029,2030,2031,2032,2033,2034,2035,2070,2071,2072,2073,2075,2076,2077,2078,2079,2080,2081,2082,2083,2085,2086,2087,2089,2090,2091,2092,2093,2137,2138,2139,2276,2277,2278,2279,2280,2281,2282,2283,2284,2285,2286,2287,2288,2289,2290,2291,2292,2293,2294,2295,2296,2297,2298,2299,2300,2301,2302,2304,2305,2306,2307,2362,2363,2364,2366,2367,2368,2369,2370,2371,2372,2373,2374,2375,2376,2377,2378,2379,2380,2381,2382,2383,2385,2386,2387,2388,2389,2390,2391,2402,2403,2433,2434,2435,2492,2494,2495,2496,2497,2498,2499,2500,2503,2504,2507,2508,2509,2519,2530,2531,2561,2562,2563,2620,2622,2623,2624,2625,2626,2631,2632,2635,2636,2637,2641,2672,2673,2677,2689,2690,2691,2748,2750,2751,2752,2753,2754,2755,2756,2757,2759,2760,2761,2763,2764,2765,2786,2787,2817,2818,2819,2876,2878,2879,2880,2881,2882,2883,2884,2887,2888,2891,2892,2893,2902,2903,2914,2915,2946,3006,3007,3008,3009,3010,3014,3015,3016,3018,3019,3020,3021,3031,3073,3074,3075,3134,3135,3136,3137,3138,3139,3140,3142,3143,3144,3146,3147,3148,3149,3157,3158,3170,3171,3202,3203,3260,3262,3263,3264,3265,3266,3267,3268,3270,3271,3272,3274,3275,3276,3277,3285,3286,3298,3299,3330,3331,3390,3391,3392,3393,3394,3395,3396,3398,3399,3400,3402,3403,3404,3405,3415,3426,3427,3458,3459,3530,3535,3536,3537,3538,3539,3540,3542,3544,3545,3546,3547,3548,3549,3550,3551,3570,3571,3633,3636,3637,3638,3639,3640,3641,3642,3655,3656,3657,3658,3659,3660,3661,3662,3761,3764,3765,3766,3767,3768,3769,3771,3772,3784,3785,3786,3787,3788,3789,3864,3865,3893,3895,3897,3902,3903,3953,3954,3955,3956,3957,3958,3959,3960,3961,3962,3963,3964,3965,3966,3967,3968,3969,3970,3971,3972,3974,3975,3981,3982,3983,3984,3985,3986,3987,3988,3989,3990,3991,3993,3994,3995,3996,3997,3998,3999,4000,4001,4002,4003,4004,4005,4006,4007,4008,4009,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028,4038,4139,4140,4141,4142,4143,4144,4145,4146,4147,4148,4149,4150,4151,4152,4153,4154,4155,4156,4157,4158,4182,4183,4184,4185,4190,4191,4192,4194,4195,4196,4199,4200,4201,4202,4203,4204,4205,4209,4210,4211,4212,4226,4227,4228,4229,4230,4231,4232,4233,4234,4235,4236,4237,4239,4250,4251,4252,4253,4957,4958,4959,5906,5907,5908,5938,5939,5940,5970,5971,6002,6003,6068,6069,6070,6071,6072,6073,6074,6075,6076,6077,6078,6079,6080,6081,6082,6083,6084,6085,6086,6087,6088,6089,6090,6091,6092,6093,6094,6095,6096,6097,6098,6099,6109,6155,6156,6157,6158,6313,6432,6433,6434,6435,6436,6437,6438,6439,6440,6441,6442,6443,6448,6449,6450,6451,6452,6453,6454,6455,6456,6457,6458,6459,6576,6577,6578,6579,6580,6581,6582,6583,6584,6585,6586,6587,6588,6589,6590,6591,6592,6600,6601,6679,6680,6681,6682,6683,6741,6742,6743,6744,6745,6746,6747,6748,6749,6750,6752,6753,6754,6755,6756,6757,6758,6759,6760,6761,6762,6763,6764,6765,6766,6767,6768,6769,6770,6771,6772,6773,6774,6775,6776,6777,6778,6779,6780,6783,6912,6913,6914,6915,6916,6964,6965,6966,6967,6968,6969,6970,6971,6972,6973,6974,6975,6976,6977,6978,6979,6980,7019,7020,7021,7022,7023,7024,7025,7026,7027,7040,7041,7042,7073,7074,7075,7076,7077,7078,7079,7080,7081,7082,7083,7084,7085,7142,7143,7144,7145,7146,7147,7148,7149,7150,7151,7152,7153,7154,7155,7204,7205,7206,7207,7208,7209,7210,7211,7212,7213,7214,7215,7216,7217,7218,7219,7220,7221,7222,7223,7376,7377,7378,7380,7381,7382,7383,7384,7385,7386,7387,7388,7389,7390,7391,7392,7393,7394,7395,7396,7397,7398,7399,7400,7405,7410,7411,7412,7616,7617,7618,7619,7620,7621,7622,7623,7624,7625,7626,7627,7628,7629,7630,7631,7632,7633,7634,7635,7636,7637,7638,7639,7640,7641,7642,7643,7644,7645,7646,7647,7648,7649,7650,7651,7652,7653,7654,7676,7677,7678,7679,8255,8256,8276,8400,8401,8402,8403,8404,8405,8406,8407,8408,8409,8410,8411,8412,8417,8421,8422,8423,8424,8425,8426,8427,8428,8429,8430,8431,8432,11503,11504,11505,11647,11744,11745,11746,11747,11748,11749,11750,11751,11752,11753,11754,11755,11756,11757,11758,11759,11760,11761,11762,11763,11764,11765,11766,11767,11768,11769,11770,11771,11772,11773,11774,11775,12330,12331,12332,12333,12334,12335,12441,12442,42607,42612,42613,42614,42615,42616,42617,42618,42619,42620,42621,42655,42736,42737,43010,43014,43019,43043,43044,43045,43046,43047,43136,43137,43188,43189,43190,43191,43192,43193,43194,43195,43196,43197,43198,43199,43200,43201,43202,43203,43204,43232,43233,43234,43235,43236,43237,43238,43239,43240,43241,43242,43243,43244,43245,43246,43247,43248,43249,43302,43303,43304,43305,43306,43307,43308,43309,43335,43336,43337,43338,43339,43340,43341,43342,43343,43344,43345,43346,43347,43392,43393,43394,43395,43443,43444,43445,43446,43447,43448,43449,43450,43451,43452,43453,43454,43455,43456,43561,43562,43563,43564,43565,43566,43567,43568,43569,43570,43571,43572,43573,43574,43587,43596,43597,43643,43696,43698,43699,43700,43703,43704,43710,43711,43713,43755,43756,43757,43758,43759,43765,43766,44003,44004,44005,44006,44007,44008,44009,44010,44012,44013,64286,65024,65025,65026,65027,65028,65029,65030,65031,65032,65033,65034,65035,65036,65037,65038,65039,65056,65057,65058,65059,65060,65061,65062,65075,65076,65101,65102,65103,65343.

I've sorted them into ranges if this helps: 768-879,1155-1159,1425-1469,1471,1473-1474,1476-1477,1479,1552-1562,1611-1631,1648,1750-1756,1759-1764,1767-1768,1770-1773,1809,1840-1866,1958-1968,2027-2035,2070-2073,2075-2083,2085-2087,2089-2093,2137-2139,2276-2302,2304-2307,2362-2364,2366-2383,2385-2391,2402-2403,2433-2435,2492,2494-2500,2503-2504,2507-2509,2519,2530-2531,2561-2563,2620,2622-2626,2631-2632,2635-2637,2641,2672-2673,2677,2689-2691,2748,2750-2757,2759-2761,2763-2765,2786-2787,2817-2819,2876,2878-2884,2887-2888,2891-2893,2902-2903,2914-2915,2946,3006-3010,3014-3016,3018-3021,3031,3073-3075,3134-3140,3142-3144,3146-3149,3157-3158,3170-3171,3202-3203,3260,3262-3268,3270-3272,3274-3277,3285-3286,3298-3299,3330-3331,3390-3396,3398-3400,3402-3405,3415,3426-3427,3458-3459,3530,3535-3540,3542,3544-3551,3570-3571,3633,3636-3642,3655-3662,3761,3764-3769,3771-3772,3784-3789,3864-3865,3893,3895,3897,3902-3903,3953-3972,3974-3975,3981-3991,3993-4028,4038,4139-4158,4182-4185,4190-4192,4194-4196,4199-4205,4209-4212,4226-4237,4239,4250-4253,4957-4959,5906-5908,5938-5940,5970-5971,6002-6003,6068-6099,6109,6155-6158,6313,6432-6443,6448-6459,6576-6592,6600-6601,6679-6683,6741-6750,6752-6780,6783,6912-6916,6964-6980,7019-7027,7040-7042,7073-7085,7142-7155,7204-7223,7376-7378,7380-7400,7405,7410-7412,7616-7654,7676-7679,8255-8256,8276,8400-8412,8417,8421-8432,11503-11505,11647,11744-11775,12330-12335,12441-12442,42607,42612-42621,42655,42736-42737,43010,43014,43019,43043-43047,43136-43137,43188-43204,43232-43249,43302-43309,43335-43347,43392-43395,43443-43456,43561-43574,43587,43596-43597,43643,43696,43698-43700,43703-43704,43710-43711,43713,43755-43759,43765-43766,44003-44010,44012-44013,64286,65024-65039,65056-65062,65075-65076,65101-65103,65343

hackvertor commented 7 years ago

Interestingly character 6158 can also be used between a function name and parenthesis.

eval('alert'+String.fromCharCode(6158)+'(1)')
bterlson commented 7 years ago

@dilijev fyi, related to our conversation yesterday.

dilijev commented 7 years ago

I wonder what category or categories these characters have in common. Anyone know a quick way to query? My thought was that they are neither ID_START nor ID_CONTINUE, nor do they have any syntactic meaning otherwise, and so they are not treated as parts of the identifiers or any other syntax.

digitalinfinity commented 7 years ago

@dilijev is this a Windows Globalization vs ICU issue? Does either platform deal with this more correctly?

dilijev commented 7 years ago

@digitalinfinity I think it has to do with parser which is either using WinGlob (which is IIRC approx Unicode 6.3 so there might be some changes since then) or is tables in the source -- or assuming it has all relevant characters classed correctly, it might just be a logic error in the parser. Needs more investigation to root cause this issue before blaming specifically the i18n library.

Only comment I have re: difference between WinGlob and ICU is: if it matters, it has to do with Unicode version, and ICU will be better than WinGlob there.

digitalinfinity commented 7 years ago

I think on Linux, the parser does use the ICU for classification, but IIRC there is additional logic in the PAL around whitespace classification specifically. On top of that, IIRC, the parser does have additional logic to treat more codepoints as Whitespace in particular compat modes, which makes the entire story very confusing 😄

dilijev commented 7 years ago

I'll check behavior with ch on Linux.

dilijev commented 7 years ago

Anyway, on Windows:

> eshost -is lol.js
## Source
// https://unicode-table.com/en/0EB1/
// neither ID_START nor ID_CONTINUE, I believe
ັັັprint(ັັັ'LOL'ັັັ)ັັັ

#### d8, node, node-nightly
SyntaxError: Invalid or unexpected token

#### sm
SyntaxError: illegal character:

#### ch-1.2.3, ch-1.3.2, ch-1.4.3, ch-master, ch-dev, node-ch
LOL
kunalspathak commented 7 years ago

So I tried checking some of the unicode characters on http://www.fileformat.info/ and found a common pattern where Character.isJavaIdentifierPart() is YES. After reading the description of isJavaIdentifierPart(), they return true if the unicode is a combining mark, a non-spacing mark or letter (which includes Other_Letters.). With that information, i ran through the sorted code points that @hackvertor provided and queried UnicodeData.txt. I noticed that most of the code points fall in one of these categories that I mentioned. There are still handful of them that I am not sure what common pattern exist in them. But @dilijev hope that helps you in your investigation.

var fs = require('fs');

const loadfile = 'UnicodeData.txt';
var unicodeEntries = fs.readFileSync(loadfile, 'utf8').split('\n');
var map = new Map();

// ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html#Field Formats
var input = [768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 1155, 1156, 1157, 1158, 1159, 1425, 1426, 1427, 1428, 1429, 1430, 1431, 1432, 1433, 1434, 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457, 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1471, 1473, 1474, 1476, 1477, 1479, 1552, 1553, 1554, 1555, 1556, 1557, 1558, 1559, 1560, 1561, 1562, 1611, 1612, 1613, 1614, 1615, 1616, 1617, 1618, 1619, 1620, 1621, 1622, 1623, 1624, 1625, 1626, 1627, 1628, 1629, 1630, 1631, 1648, 1750, 1751, 1752, 1753, 1754, 1755, 1756, 1759, 1760, 1761, 1762, 1763, 1764, 1767, 1768, 1770, 1771, 1772, 1773, 1809, 1840, 1841, 1842, 1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2070, 2071, 2072, 2073, 2075, 2076, 2077, 2078, 2079, 2080, 2081, 2082, 2083, 2085, 2086, 2087, 2089, 2090, 2091, 2092, 2093, 2137, 2138, 2139, 2276, 2277, 2278, 2279, 2280, 2281, 2282, 2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293, 2294, 2295, 2296, 2297, 2298, 2299, 2300, 2301, 2302, 2304, 2305, 2306, 2307, 2362, 2363, 2364, 2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377, 2378, 2379, 2380, 2381, 2382, 2383, 2385, 2386, 2387, 2388, 2389, 2390, 2391, 2402, 2403, 2433, 2434, 2435, 2492, 2494, 2495, 2496, 2497, 2498, 2499, 2500, 2503, 2504, 2507, 2508, 2509, 2519, 2530, 2531, 2561, 2562, 2563, 2620, 2622, 2623, 2624, 2625, 2626, 2631, 2632, 2635, 2636, 2637, 2641, 2672, 2673, 2677, 2689, 2690, 2691, 2748, 2750, 2751, 2752, 2753, 2754, 2755, 2756, 2757, 2759, 2760, 2761, 2763, 2764, 2765, 2786, 2787, 2817, 2818, 2819, 2876, 2878, 2879, 2880, 2881, 2882, 2883, 2884, 2887, 2888, 2891, 2892, 2893, 2902, 2903, 2914, 2915, 2946, 3006, 3007, 3008, 3009, 3010, 3014, 3015, 3016, 3018, 3019, 3020, 3021, 3031, 3073, 3074, 3075, 3134, 3135, 3136, 3137, 3138, 3139, 3140, 3142, 3143, 3144, 3146, 3147, 3148, 3149, 3157, 3158, 3170, 3171, 3202, 3203, 3260, 3262, 3263, 3264, 3265, 3266, 3267, 3268, 3270, 3271, 3272, 3274, 3275, 3276, 3277, 3285, 3286, 3298, 3299, 3330, 3331, 3390, 3391, 3392, 3393, 3394, 3395, 3396, 3398, 3399, 3400, 3402, 3403, 3404, 3405, 3415, 3426, 3427, 3458, 3459, 3530, 3535, 3536, 3537, 3538, 3539, 3540, 3542, 3544, 3545, 3546, 3547, 3548, 3549, 3550, 3551, 3570, 3571, 3633, 3636, 3637, 3638, 3639, 3640, 3641, 3642, 3655, 3656, 3657, 3658, 3659, 3660, 3661, 3662, 3761, 3764, 3765, 3766, 3767, 3768, 3769, 3771, 3772, 3784, 3785, 3786, 3787, 3788, 3789, 3864, 3865, 3893, 3895, 3897, 3902, 3903, 3953, 3954, 3955, 3956, 3957, 3958, 3959, 3960, 3961, 3962, 3963, 3964, 3965, 3966, 3967, 3968, 3969, 3970, 3971, 3972, 3974, 3975, 3981, 3982, 3983, 3984, 3985, 3986, 3987, 3988, 3989, 3990, 3991, 3993, 3994, 3995, 3996, 3997, 3998, 3999, 4000, 4001, 4002, 4003, 4004, 4005, 4006, 4007, 4008, 4009, 4010, 4011, 4012, 4013, 4014, 4015, 4016, 4017, 4018, 4019, 4020, 4021, 4022, 4023, 4024, 4025, 4026, 4027, 4028, 4038, 4139, 4140, 4141, 4142, 4143, 4144, 4145, 4146, 4147, 4148, 4149, 4150, 4151, 4152, 4153, 4154, 4155, 4156, 4157, 4158, 4182, 4183, 4184, 4185, 4190, 4191, 4192, 4194, 4195, 4196, 4199, 4200, 4201, 4202, 4203, 4204, 4205, 4209, 4210, 4211, 4212, 4226, 4227, 4228, 4229, 4230, 4231, 4232, 4233, 4234, 4235, 4236, 4237, 4239, 4250, 4251, 4252, 4253, 4957, 4958, 4959, 5906, 5907, 5908, 5938, 5939, 5940, 5970, 5971, 6002, 6003, 6068, 6069, 6070, 6071, 6072, 6073, 6074, 6075, 6076, 6077, 6078, 6079, 6080, 6081, 6082, 6083, 6084, 6085, 6086, 6087, 6088, 6089, 6090, 6091, 6092, 6093, 6094, 6095, 6096, 6097, 6098, 6099, 6109, 6155, 6156, 6157, 6158, 6313, 6432, 6433, 6434, 6435, 6436, 6437, 6438, 6439, 6440, 6441, 6442, 6443, 6448, 6449, 6450, 6451, 6452, 6453, 6454, 6455, 6456, 6457, 6458, 6459, 6576, 6577, 6578, 6579, 6580, 6581, 6582, 6583, 6584, 6585, 6586, 6587, 6588, 6589, 6590, 6591, 6592, 6600, 6601, 6679, 6680, 6681, 6682, 6683, 6741, 6742, 6743, 6744, 6745, 6746, 6747, 6748, 6749, 6750, 6752, 6753, 6754, 6755, 6756, 6757, 6758, 6759, 6760, 6761, 6762, 6763, 6764, 6765, 6766, 6767, 6768, 6769, 6770, 6771, 6772, 6773, 6774, 6775, 6776, 6777, 6778, 6779, 6780, 6783, 6912, 6913, 6914, 6915, 6916, 6964, 6965, 6966, 6967, 6968, 6969, 6970, 6971, 6972, 6973, 6974, 6975, 6976, 6977, 6978, 6979, 6980, 7019, 7020, 7021, 7022, 7023, 7024, 7025, 7026, 7027, 7040, 7041, 7042, 7073, 7074, 7075, 7076, 7077, 7078, 7079, 7080, 7081, 7082, 7083, 7084, 7085, 7142, 7143, 7144, 7145, 7146, 7147, 7148, 7149, 7150, 7151, 7152, 7153, 7154, 7155, 7204, 7205, 7206, 7207, 7208, 7209, 7210, 7211, 7212, 7213, 7214, 7215, 7216, 7217, 7218, 7219, 7220, 7221, 7222, 7223, 7376, 7377, 7378, 7380, 7381, 7382, 7383, 7384, 7385, 7386, 7387, 7388, 7389, 7390, 7391, 7392, 7393, 7394, 7395, 7396, 7397, 7398, 7399, 7400, 7405, 7410, 7411, 7412, 7616, 7617, 7618, 7619, 7620, 7621, 7622, 7623, 7624, 7625, 7626, 7627, 7628, 7629, 7630, 7631, 7632, 7633, 7634, 7635, 7636, 7637, 7638, 7639, 7640, 7641, 7642, 7643, 7644, 7645, 7646, 7647, 7648, 7649, 7650, 7651, 7652, 7653, 7654, 7676, 7677, 7678, 7679, 8255, 8256, 8276, 8400, 8401, 8402, 8403, 8404, 8405, 8406, 8407, 8408, 8409, 8410, 8411, 8412, 8417, 8421, 8422, 8423, 8424, 8425, 8426, 8427, 8428, 8429, 8430, 8431, 8432, 11503, 11504, 11505, 11647, 11744, 11745, 11746, 11747, 11748, 11749, 11750, 11751, 11752, 11753, 11754, 11755, 11756, 11757, 11758, 11759, 11760, 11761, 11762, 11763, 11764, 11765, 11766, 11767, 11768, 11769, 11770, 11771, 11772, 11773, 11774, 11775, 12330, 12331, 12332, 12333, 12334, 12335, 12441, 12442, 42607, 42612, 42613, 42614, 42615, 42616, 42617, 42618, 42619, 42620, 42621, 42655, 42736, 42737, 43010, 43014, 43019, 43043, 43044, 43045, 43046, 43047, 43136, 43137, 43188, 43189, 43190, 43191, 43192, 43193, 43194, 43195, 43196, 43197, 43198, 43199, 43200, 43201, 43202, 43203, 43204, 43232, 43233, 43234, 43235, 43236, 43237, 43238, 43239, 43240, 43241, 43242, 43243, 43244, 43245, 43246, 43247, 43248, 43249, 43302, 43303, 43304, 43305, 43306, 43307, 43308, 43309, 43335, 43336, 43337, 43338, 43339, 43340, 43341, 43342, 43343, 43344, 43345, 43346, 43347, 43392, 43393, 43394, 43395, 43443, 43444, 43445, 43446, 43447, 43448, 43449, 43450, 43451, 43452, 43453, 43454, 43455, 43456, 43561, 43562, 43563, 43564, 43565, 43566, 43567, 43568, 43569, 43570, 43571, 43572, 43573, 43574, 43587, 43596, 43597, 43643, 43696, 43698, 43699, 43700, 43703, 43704, 43710, 43711, 43713, 43755, 43756, 43757, 43758, 43759, 43765, 43766, 44003, 44004, 44005, 44006, 44007, 44008, 44009, 44010, 44012, 44013, 64286, 65024, 65025, 65026, 65027, 65028, 65029, 65030, 65031, 65032, 65033, 65034, 65035, 65036, 65037, 65038, 65039, 65056, 65057, 65058, 65059, 65060, 65061, 65062, 65075, 65076, 65101, 65102, 65103, 65343];
var index = 0;
for(var unicodeEntry of unicodeEntries) {
  var splitted = unicodeEntry.split(';');
  var unicode = parseInt(splitted[0], 16);
  var bidi = splitted[4];
  var general_category = splitted[2];
  map.set(unicode, { bidi: bidi, unicodeEntry: unicodeEntry, general: general_category });
}

for(var i of input) {
  var entry = map.get(i);
  if (entry != undefined) {
    // Entry should be one of the following
    // NSM = Nonspacing_Mark    any nonspacing mark
    // Mc = Spacing_Mark    a spacing combining mark (positive advance width)
    // Mn = Nonspacing_Mark a nonspacing combining mark (zero advance width)
    // Lo = Other_Letters
    if (entry.bidi !== 'NSM' && entry.general !== 'Mc' && entry.general !== 'Mn' && entry.general !== 'Lo') {
      console.log(`${entry.unicodeEntry}`)
    }
  }
}
dilijev commented 7 years ago

Linux:

lol-eval.js

// https://unicode-table.com/en/0EB1/
// neither ID_START nor ID_CONTINUE, I believe
let c = String.fromCharCode(0x0EB1);
let ccc = `${c}${c}${c}`;
let str = `${ccc}print(${ccc}'LOL'${ccc})${ccc}`;
// print(str);
eval(str);
$ ch lol-eval.js
LOL

Based on that, I think it's just a parser logic issue.

dilijev commented 7 years ago

@kunalspathak Thanks!

I think the ID_START and ID_CONTINUE (superset of ID_START) make up the grammar for an identifier (which includes those marks) but I can't find the exact description of which classes make up those categories at the moment. (There are also some extra characters or categories beyond the three categories you mentioned, I believe.)

ID_CONTINUE_EXCLUSIVE = ??
ID_START = ??
ID_CONTINUE : ID_START | ID_CONTINUE_EXCLUSIVE
identifier : ID_START [identifier_coda]
identifier_coda : ID_CONTINUE [identifier_coda]

Or if you like RegExp instead of BNF, identifiers match this pattern: /\p{ID_START}\p{ID_CONTINUE}*/

dilijev commented 7 years ago

Found it:

In DerivedCoreProperties.txt from the UCD:

# Derived Property: ID_Start
#  Characters that can start an identifier.
#  Generated from:
#      Lu + Ll + Lt + Lm + Lo + Nl
#    + Other_ID_Start
#    - Pattern_Syntax
#    - Pattern_White_Space
#  NOTE: See UAX #31 for more information

<enumeration of code points>
# Total code points: 117007
# Derived Property: ID_Continue
#  Characters that can continue an identifier.
#  Generated from:
#      ID_Start
#    + Mn + Mc + Nd + Pc
#    + Other_ID_Continue
#    - Pattern_Syntax
#    - Pattern_White_Space
#  NOTE: See UAX #31 for more information

<enumeration of code points>
# Total code points: 119691
dilijev commented 7 years ago

Related #3271 #1208

dilijev commented 7 years ago

Removing my assignment. Seems that this issue is not related to i18n library. Might be an error in parser logic.

EdMaurer commented 7 years ago

I don't see a lot of impact from this issue. Considering closing. Is this affecting anyone's scenario?

dilijev commented 7 years ago

I think we should track this in the Backlog rather than closing it since we haven't fixed the issue and it is a spec violation.

bterlson commented 7 years ago

Agreed, conformance+interop issues can never be closed, only prioritized against other items (possibly for years ;))