BK-SCOSS / sctokenizer

A Source Code Tokenizer
MIT License
14 stars 5 forks source link

Python and C tokenize misidentifies identifier #12

Open gulaki opened 1 year ago

gulaki commented 1 year ago

st.CTokenizer().tokenize('typedef struct {uint8_t* __data_rhj_, long size} byte_array_t;')

returns


 (struct, TokenType.KEYWORD, (1, 9)),
 ({, TokenType.SPECIAL_SYMBOL, (1, 16)),
 (uint8_t, TokenType.IDENTIFIER, (1, 17)),
 (*, TokenType.OPERATOR, (1, 24)),
 (_, TokenType.SPECIAL_SYMBOL, (1, 26)),   --> this
 (_, TokenType.SPECIAL_SYMBOL, (1, 27)),   --> and this should be part of identifier.
 (data_rhj_, TokenType.IDENTIFIER, (1, 28)),
 (,, TokenType.OPERATOR, (1, 37)),
 (long, TokenType.KEYWORD, (1, 39)),
 (size, TokenType.IDENTIFIER, (1, 44)),
 (}, TokenType.SPECIAL_SYMBOL, (1, 48)),
 (byte_array_t, TokenType.IDENTIFIER, (1, 50)),
 (;, TokenType.SPECIAL_SYMBOL, (1, 62))]```

Similar mistake in Python.