facebook / pyre-check

Performant type-checking for python.
https://pyre-check.org/
MIT License
6.87k stars 437 forks source link

Draft PR for Pysa Fuzzer #886

Closed esohel30 closed 2 months ago

esohel30 commented 5 months ago

Draft PR for the pysa fuzzer project 🚀

arthaud commented 5 months ago

Cool, this is a good start.

A few comments:

esohel30 commented 5 months ago

Enhanced Pysa fuzzer by adding type annotations, ensuring variables are defined before use, making expression generation truly recursive, and using textwrap.indent for better code formatting. Added a defined_variables set to track declared variables, improving code validity. Despite these improvements, the fuzzer is still far from perfect and requires further refinement to enhance code generation diversity and flow validation. Will continue to work on it extensively! Might even approach it in a different manner now that I have been playing around with a it for a bit.

arthaud commented 5 months ago

Cool, another round of feedback:

I believe @alexkassil also has a different idea to generate code that always has a valid flow, feel free to ask him if you are interested.

arthaud commented 5 months ago

My feedback:

alexkassil commented 5 months ago

Hey @esohel30 , so the idea for this project is to automatically find false negatives - ie flows that should be security issues, but for whatever reason pysa doesn't find it.

The way to do this is to generate increasingly complex valid flows for pysa to find. Everything generated should be a valid security issue.

https://github.com/facebook/pyre-check/tree/main/source/interprocedural_analyses/taint/test/integration take a look at the tests here (in the .py files).

Here's an example: https://github.com/facebook/pyre-check/blob/main/source/interprocedural_analyses/taint/test/integration/source_sink_flow.py

from builtins import _test_sink, _test_source

def bar():
    return _test_source()

def qux(arg):
    _test_sink(arg)

def bad(ok, arg):
    qux(arg)

def some_source():
    return bar()

def match_flows():
    x = some_source()
    bad(5, x)

One issue in this file is the flow in match_flows() -> bad()

One way to always generate valid issues is to start with _test_sink(_test_source()), and then pick operations that make it so the flow is one more hop away.

For example, let's say you add 3 functionalities of mutations to the fuzzer:

  1. Extra variable
  2. Function call
  3. if statement else clause

And now you randomly pick from those 3 elements 4 times to get [1, 2, 3, 2].

Applying those mutations step by step gets you:

f1():
  x = test_source()
  _test_sink(x)
def f2():
  x = test_source()
  f2(x)

def f1(x):
  _test_sink(x)
def f2(cond):
  x = test_source()
  if cond:
    pass
  else:
    f2(x)

def f1(x):
  _test_sink(x)
def f3(x, cond):
    x = test_source()
    f2(x, cond)

def f2(x, cond):
  if cond:
    pass
  else:
    f2(x)

def f1(x):
  _test_sink(x)

Continuing adding more and more single hop transformations that preserve the flow will make the fuzzer be able to generate all the valid flows present in https://github.com/facebook/pyre-check/tree/main/source/interprocedural_analyses/taint/test/integration - I think for simplicity do not worry about any modelling other than _test_source and _test_sink

facebook-github-bot commented 2 months ago

@arthaud has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 2 months ago

@arthaud merged this pull request in facebook/pyre-check@99a07a24e31679d69aab008e66da27fe6077ba4d.

alexkassil commented 2 months ago

Congrats and well done @esohel30 !

arthaud commented 2 months ago

Thanks for the hard work! We have finally merged this and found a few false negatives (see upcommit commits).