Yelp / fuzz-lightyear

A pytest-inspired, DAST framework, capable of identifying vulnerabilities in a distributed, micro-service ecosystem through chaos engineering testing and stateful, Swagger fuzzing.
Other
205 stars 25 forks source link

Improve test discovery algorithm #37

Closed margaretgorguissian closed 4 years ago

margaretgorguissian commented 4 years ago

WIP

This PR will be a hybrid of what is described in issue #8 and what is described in the original research paper.

We'll start with the assumption that requests produce and consume single values: e.g. endpoint /a produces an integer num and /b consumes an integer num. For simplicity's sake, this is ignoring the case where some endpoint /c produces a dictionary that contains an integer num as one of its values.

To begin building a request graph, we need to know what endpoints produce and consume what values. So, two dictionaries are built: one that maps operation_ids to the resources they consume, and one that maps resources to the operation_ids that will produce them.

From there, we construct a directed request graph. Every node is an operation_id, and an edge between two nodes A and B means that operation A consumes a resource that B produces. (This is because, as described in issue #8, we want to "work backwards" from one request. The graph is stored as an adjacency list. In the previous example, the graph would be represented as such:

{
     "A": ["B"],
}

~Afterwards, a DFS can occur, with a max depth of n, where n is a request sequence length specified by the user.~

Instead of conducting a DFS, we'll use the existing request sequence generating algorithm, but only add an operation to the sequence if it shares an edge with any of the preceding operation in the sequence. This is because, for any given operation, we have no idea if the request associated it will use a previously produced value, fuzz a value, or use a factory until we do the work to fuzz the request. Opting to only add requests to the sequence if they share an edge with any previous request is in line with the algorithm described in the original research paper (which only adds requests to the sequence if all of its parameters have been produced by previous requests), while reducing/simplifying the data structures needed and acknowledging that we may use factories or fuzz values.

TESTING Testing poses an issue, and I'm still trying to think of an effective automated method of testing. I manually walked through request sequences with a debugger and found them to be what was expected. However, comparing a generated list of successful sequences to an expect list is probably not the best route for testing, as the results would be predicated on what endpoints are included in vulnerable_app, which may change! edit: In regards to testing, after chatting with @domanchi, the conclusion was that creating tests that rely on vulnerable_app endpoints is inevitable, and that we check for sequences that should not be in the list of sequences as opposed to checking for all possible sequences. I just added an assert to an existing test that builds a sequence, which I think is okay because it's an integration test.