BK-SCOSS / sctokenizer

A Source Code Tokenizer
MIT License
14 stars 5 forks source link

Java Tokenizer with carriage returns fails to tokenize #11

Closed LakshyAAAgrawal closed 1 month ago

LakshyAAAgrawal commented 2 years ago

A very artificial example but while processing snippets, running the Java Tokenizer with

sctokenizer.tokenize_str("\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r}", lang='java')

gives the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sctokenizer/main.py", line 11, in tokenize_str
    return src.tokenize() 
  File "sctokenizer/source.py", line 85, in tokenize
    self.tokens = java_tokenizer.tokenize(self.source_str)
  File "sctokenizer/java_tokenizer.py", line 201, in tokenize
    self.add_pending(tokens, cur, TokenType.SPECIAL_SYMBOL, len_lines, t)
  File "sctokenizer/tokenizer.py", line 31, in add_pending
    self.colnumber -= (len_lines[k] + 1)
IndexError: list index out of range
Dec1mo commented 2 years ago

Can you try with "}" removed at the end of the java source code (input string)? Since this project only works with source code that has no syntax errors.

Thanks.