BK-SCOSS / sctokenizer

A Source Code Tokenizer
MIT License
14 stars 5 forks source link

Incorrect processing of C/C++ escape-sequence in character constant #14

Open Derek-Jones opened 1 month ago

Derek-Jones commented 1 month ago

The C/C++ escape-sequence in the character constant '\' is incorrectly processed, leading to the characters up to any following single-quote being treated as a single token

a == '\\' && b == '|'
(a, TokenType.IDENTIFIER, (1, 1))
(==, TokenType.OPERATOR, (1, 3))
(', TokenType.SPECIAL_SYMBOL, (1, 6))
(\\' && b == , TokenType.CONSTANT, (1, 7))
(', TokenType.SPECIAL_SYMBOL, (1, 19))
(|, TokenType.OPERATOR, (1, 20))
(', TokenType.SPECIAL_SYMBOL, (1, 21))
Dec1mo commented 1 month ago

Thanks for informing us this bug, Derek. Unfortunately, we no longer maintain this repository. But we do welcome contributions, so you might make a PR to fix this error. Cheers!

Derek-Jones commented 3 weeks ago

I looked at the source, and it looks like a fix would involve lots of reorganizing. So not keen to attempt a fix.