eliben / pycparser

:snake: Complete C99 parser in pure Python
Other
3.26k stars 612 forks source link

Uncaught exception ValueError in CParser.parse #497

Closed DavidKorczynski closed 1 year ago

DavidKorczynski commented 1 year ago

The following program crashes with an uncaught exception:

import sys
import atheris
import pycparser

def TestOneInput(data):
  fdp = atheris.FuzzedDataProvider(data)
  lex_optimize = fdp.ConsumeBool()
  yacc_debug = fdp.ConsumeBool()
  yacc_optimize = fdp.ConsumeBool()

  c_source = fdp.ConsumeUnicodeNoSurrogates(sys.maxsize)
  _c_parser = pycparser.c_parser.CParser(
                lex_optimize=lex_optimize,
                yacc_debug=yacc_debug,
                yacc_optimize=yacc_optimize)
  try:
    _c_parser.parse(
        c_source,
        ''
    )
  except pycparser.c_parser.ParseError:
    pass
  except AssertionError:
    pass

data = (b"\xff\xff\x74\x7b\x23\x31\x75\x0a")
TestOneInput(data)

where atheris is https://pypi.org/project/atheris/

The stacktrace looks as follows:

 === Uncaught Python exception: ===
ValueError: invalid literal for int() with base 10: '1u'
Traceback (most recent call last):
  File "fuzz_c_parser.py", line 33, in TestOneInput
  File "pycparser/c_parser.py", line 147, in parse
  File "pycparser/ply/yacc.py", line 331, in parse
  File "pycparser/ply/yacc.py", line 1061, in parseopt_notrack
  File "pycparser/c_lexer.py", line 76, in token
  File "pycparser/ply/lex.py", line 350, in token
  File "pycparser/c_lexer.py", line 325, in t_ppline_NEWLINE
ValueError: invalid literal for int() with base 10: '1u'

the FuzzedDataProvider is used to convert the data into primitive types derived from the data byte sequence. If we write out the values from fdp we get the following program which also triggers the uncaught exception:

import pycparser

lex_optimize = True
yacc_debug = True
yacc_optimize = False
c_source = "#1u\n"

_c_parser = pycparser.c_parser.CParser(
              lex_optimize=lex_optimize,
              yacc_debug=yacc_debug,
              yacc_optimize=yacc_optimize)
try:
  _c_parser.parse(
      c_source,
      ''
  )
except pycparser.c_parser.ParseError:
  pass
except AssertionError:
  pass
eliben commented 1 year ago

In general, pycparser's goal is to parse valid C code. I'm not particularly interested in fuzz outputs for that reason.

I don't mind looking at a specific report but it has to be submitted as a MRE that I can just run without installing 3rd party packages and fuzzing tools