Closed kungfooman closed 3 years ago
Minimal example with unicode character π:
txt = ` Algebra(2,0,1,()=>{ console.log(π) });` var tokens = [ // 0: whitespace/comments /^[\s\uFFFF]|^[\u000A\u000D\u2028\u2029]|^\/\/[^\n]*\n|^\/\*[\s\S]*?\*\//g, // 1: literal strings /^\"\"|^\'\'|^\".*?[^\\]\"|^\'.*?[^\\]\'|^\`[\s\S]*?[^\\]\`/g, // 2: literal numbers in scientific notation (with small hack for i and e_ asciimath) /^\d+[.]{0,1}\d*[ei][\+\-_]{0,1}\d*|^\.\d+[ei][\+\-_]{0,1}\d*|^e_\d*/g, // 3: literal hex, nonsci numbers and regex (surround regex with extra brackets!) /^\d+[.]{0,1}\d*[E][+-]{0,1}\d*|^\.\d+[E][+-]{0,1}\d*|^0x\d+|^\d+[.]{0,1}\d*|^\.\d+|^\(\/.*[^\\]\/\)/g, // 4: punctuator /^(\.Normalized|\.Length|\.\.\.|>>>=|===|!==|>>>|<<=|>>=|=>|\|\||[<>\+\-\*%&|^\/!\=]=|\*\*|\+\+|\-\-|<<|>>|\&\&|\^\^|^[{}()\[\];.,<>\+\-\*%|&^!~?:=\/]{1})/g, // 5: identifier /^[A-Za-z0-9_]*/g ]; tok = []; resi = []; while (txt.length) { for (t in tokens) { if (resi = txt.match(tokens[t])) { tok.push([t | 0, resi[0]]); txt = txt.slice(resi[0].length); break; } // tokenise } }
Problem is this code:
"πℇ".match(/^[A-Za-z0-9_]*/g)
Result is length 0:
Slice 0 from string and it is unchanged (endless loop filling up tok array with empty ["5",""] tuples
tok
["5",""]
Minimal example with unicode character π: