Improve test discovery algorithm

Issue

Currently, we perform a naive search for request sequences longer than n=1. That is, if we have three endpoints (A, B, and C) and n=2, we would generate 6 request sequences:

We can do better than this.

Solution

In this research paper (Section 3.1), Microsoft describes their test generation algorithm which narrows the realm of possibilities with this simple rule:

Check that all objects referenced in a request are produced by some response in a prior request.

This gets tricky with the use of fixtures though, because input precedence is important. I propose a pre-processing test discovery step to address this issue.

Example

Let's say we have four endpoints:

paths:
  /a:
    post:
      parameters:
        - name: "five"
          in: "formData"
          type: "string"
      responses:
        200:
          schema:
            type: "object"
            properties:
              two:
                type: "string"
  /b:
    post:
      parameters:
        - name: "three"
          in: "formData"
          type: "string"
      responses:
        200:
          schema:
            type: "object"
            properties:
              four:
                type: "string"
              five:
                type: "string"
  /c:
    post:
      parameters:
        - name: "six"
          in: "formData"
          type: "string"
      responses:
        200:
          schema:
            type: "object"
            properties:
              seven:
                type: "string"
  /vulnerable:
    post:
      parameters:
        - name: "one"
          in: "formData"
          type: "string"
        - name: "two"
          in: "formData"
          type: "string"
        - name: "four"
          in: "formData"
          type: "string"
        - name: "eight"
          in: "formData"
          type: "string"
      responses:
        200:
          description: "success"

We also have a factory fixture for one, and we want to test /vulnerable.

Algorithm Details

With the pre-processing step, the above snippet can be distilled into this input-output diagram:

5 --> A --> 2
3 --> B --> 4,5
6 --> C --> 7
1,2,4,8 --> vulnerable --> none

Then, working backwards, N dictates how many hops we look back to derive our input parameters. Using the example to illustrate:

With n=1, we don't have prior knowledge of what requests could create the data we need for 1, 2, 4 and 8. Since we have a factory fixture for 1, we can use that to generate that parameter's value. For everything else, we need to fuzz it with the default fuzzing strategy.
When n=2, we search the outputs of the known endpoints to try and get values for what we need. We can see that A and B gives us 2 and 4 respectively, so know we can call A and B to generate our values. However, since we're only limited to a request sequence of 2, we can only call one of them at a time. We would still use our factory fixture for 1, and still need to fuzz 8. Since we don't have prior values for inputs for A and B, we will have to fuzz those values.
When n=3, we can call both A and B to generate all the values we need to properly test the vulnerable endpoint. Furthermore, we should acknowledge some sort of ordering in this: that B needs to be called first, so that its output can be fed into A.

Future Improvements

Assuming this works, we might even be able to suggest an optimal N for the user to run, to capture all cases.

NOTE: The reason why the algorithm works backwards is because we need to be in a situation when we have knowledge of all possibilities that the endpoints provide, before we know which parameters to fuzz with a factory.

This is to say, we are unable to check that all objects referenced in a request are produced by some response in a prior request, because they may simply be fuzzed, or generated with a factory.

Using the example to illustrate, if we proceed with the forwards approach as the paper suggests, we would still generate (/c, /vulnerable) because inputs can still be fuzzed or factory-generated. If we only proceed with requests that use at least one input from all prior output from a request, we would ignore (/a, /vulnerable), (/b, /vulnerable) and even (/b, /a, /vulnerable). Or even this case:

none --> X --> 1
2 --> Y --> 3
1,3 --> vulnerable --> none

Yelp / fuzz-lightyear