c2nes / javalang

Pure Python Java parser and tools
MIT License
736 stars 161 forks source link

Uncaught TypeError: 'in <string>' requires string as left operand, not NoneType–Due to CR handling in lexer? #43

Open eddieantonio opened 7 years ago

eddieantonio commented 7 years ago

I encountered this crash while parsing random Java from GitHub. This is the crash occurred on PolarPixellateFilter.java from chrisbatt/AndroidFastImageProcessing.

UPDATE: I strongly believe this bug to be caused by javalang's handling of carriage returns as newlines (or lack thereof). It seems that a double-slash comment // has innocently commented out the entire rest of this file despite a carriage return ending the comment well before the end of the file.

This crashed occurred both on an Ubuntu machine and an macOS machine, both running Python 3.6.1.

Traceback (most recent call last):
  File "/Users/eddieantonio/.pyenv/versions/3.6.0/lib/python3.6/pdb.py", line 1667, in main
    pdb._runscript(mainpyfile)
  File "/Users/eddieantonio/.pyenv/versions/3.6.0/lib/python3.6/pdb.py", line 1548, in _runscript
    self.run(statement)
  File "/Users/eddieantonio/.pyenv/versions/3.6.0/lib/python3.6/bdb.py", line 431, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/Users/eddieantonio/Projects/sensibility/test_fail.py", line 4, in <module>
    import javalang
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parse.py", line 53, in parse
    return parser.parse()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 110, in parse
    return self.parse_compilation_unit()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 302, in parse_compilation_unit
    type_declaration = self.parse_type_declaration()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 347, in parse_type_declaration
    return self.parse_class_or_interface_declaration()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 356, in parse_class_or_interface_declaration
    type_declaration = self.parse_normal_class_declaration()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 394, in parse_normal_class_declaration
    body = self.parse_class_body()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 768, in parse_class_body
    declaration = self.parse_class_body_declaration()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 791, in parse_class_body_declaration
    return self.parse_member_declaration()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 825, in parse_member_declaration
    member = self.parse_method_or_field_declaraction()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 839, in parse_method_or_field_declaraction
    member = self.parse_method_or_field_rest()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 857, in parse_method_or_field_rest
    return self.parse_method_declarator_rest()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 886, in parse_method_declarator_rest
    body = self.parse_block()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 1274, in parse_block
    statement = self.parse_block_statement()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 1339, in parse_block_statement
    return self.parse_statement()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 1465, in parse_statement
    value = self.parse_expression()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 1752, in parse_expression
    expressionl = self.parse_expressionl()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 1767, in parse_expressionl
    expression_2 = self.parse_expression_2()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 1796, in parse_expression_2
    parts = self.parse_expression_2_rest()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 1813, in parse_expression_2_rest
    expression = self.parse_expression_3()
  File "/Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py", line 1855, in parse_expression_3
    while token.value in '[.':
TypeError: 'in <string>' requires string as left operand, not NoneType
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /Users/eddieantonio/.pyenv/versions/sensibility/lib/python3.6/site-packages/javalang/parser.py(1855)parse_expression_3()

It crashes in parser.py.

Popping this in pdb reveals that the token is an EndOfInput:

-> while token.value in '[.':
(Pdb)
(Pdb) p token
EndOfInput "None"

However, the primary that it just parsed is nowhere near the end of input

(Pdb) p primary
Literal
(Pdb) p primary.position
(1, 951)

The only weird thing about the file is that its newline character is the carriage return (yuck!), hence javalang believes it's all on one line. Otherwise, javac considers it syntactically-valid Java 8 source code.

Replication package: javalang-crash.zip

ihayet commented 1 year ago

I encountered this error when parsing line-by-line. After debugging with Pdb, I also found that token was "EndOfInput" even before reaching the end of input. In my case, I had no CR ('\r') or New line ('\n') character by default in the input. After adding only a new line ('\n') character at the end of the input (each line in my case), the error was resolved.