mock test data generation

noahdietz commented 8 years ago

This issue is open to discuss the viability of generating mock data for STT generated unit tests. The discussion is being being continued here from #107, so refer to there if you are catching up.

noahdietz commented 8 years ago

@Maples7 your first comment:

APIs to 4 parts in order: POST to add, GET to query, PUT to modify and DELETE to remove

I understand the sentiment and it has merit. The issue is that this creates "inter-test dependencies." Meaning that one or more test depends on the others succeeding. This isn't something that we want :) These are unit tests, and as such, need to be tested independently of each other.

@mm-gmbd that last sentence there ^ also kind of applies to what you were implying if I understood correctly. But you acknowledged that in your comment. I would still say that using the "well-known" userId is an unsafe bet. If we are using it for one test, then another test changes it, or fails and mutates it, there is a chance others will fail for unknown reasons. Correct me if i misunderstood :)

Maples7 commented 8 years ago

I just present a tiny thought about the solution for the "well-known" userId problem, but it doesn't matter for me how to solve it finally. That's true the tests would not be independent in my way, but the only relationship between them is that they just operate the same data we create ourselves. I haven't figured out any drawbacks for now.

Of course, that's just a proposal, the developer of this function makes the final decision. :cherries:

mm-gmbd commented 8 years ago

@noahdietz / @Maples7 / @Remco75 - I've got a little time to flesh out my idea now, so here goes...

In Swagger, we define request schemas and response schemas. Using the users/{userID}/firstName example, the PUT could look something like the following:

paths:
  users/{userID}/firstName:
    put:
      summary: Change User First Name
      Description: A longer description
      parameters:
        - name: body
          in: body
          schema: 
            type: object
            properties:
              newName:
                type: string
                minLength: 2
                maxLength: 100
            required: ['newName']
            additionalProperties: false
      responses:
        200: 
          description: Name successfully changed
        400:
          description: Name unsuccessfully changed because `newName` was invalid
          schema:
            type: object
            properties:
              error:
                type: string
            required: ['error']
            additionalProperties: false

Notice that, in the API itself, we do not define the exact value for newName because this is an API (i.e. where we define schemas). We can define the schema for newName (i.e. the fact that it is a string, a minimum length, and a maximum length), but not the name itself.

Also, notice the same thing for the response 400. Say for instance, I decide to provide a newName that is just a. I do not define the response error message to be "Error: validation failed because the name 'a' is too short", because I could've just as easily provided a newName that was above the max length (and the error string would be something like "Error: validation failed because the name 'superLongName...' is too long"). Because this is an API, I only define that I should receive an error, and that error should be a string.

STT ensures that the schema provided in the API matches the schema of the data in the response -- note that it compares response schema, not response data, because the data is not something that one would include in the API definition:

request({ ... }, function(error, res, body) {
  res.statusCode.should.equal(400);

  validator.validate(body, schema).should.be.true;
})

As far as using a "well-known" userID, I'll reiterate one of my previous points - I don't think it should matter to the developer whether or not it is a safe bet. I feel as if you are judging it from your perspective as a user of the tool - knowing that it may not work well with your particular database. For me, as a user, it works great for my implementation - therefore I would love to have it included in the tool.

The worst that could happen is that, the user provides something like a known userID, and expects tests to pass, but because the data is not what they think it is in the database, then it fails. I would not expect random-data generation to result in false-positives (which would be bad).

Remco75 commented 8 years ago

@mm-gmbd ,cool idea's. I will look into it somewhere the coming 2 weeks (srry, kinda busy). One remark to start of: ik will probably a lot of work to catch all possible validations and write failing data for that scheme. We need to think up something clever.

As far as data generation goes: let's look into other modules already doing this to see if we can use that as dependency and join forces there.

mm-gmbd commented 8 years ago

@Remco75 -- that is part of the basis of this discussion. If you take a look at the discussion in #107, I suggested utilizing json-schema-test-data-generator.

I am already doing this today in a gulp task, but it could be easily pulled into this project:

var swagger = //some swagger
var config = { ... }

var pathsArr = [];
for (path in swagger["paths"]) {
  pathsArr.push(path);
}

//NOTE: Looper so we don't exceed call stack... it may actually not be required because the error MAY have been happening by the "generator" trying to generate a buttload of tests because the schema for some operations was just "type: object"
function looper(x, completeFunc) {
  if (x < pathsArr.length) {
    var path = pathsArr[x];
    // console.log("Checking path "+path);
    for (operation in swagger["paths"][path]) {
    if (swagger["paths"][path][operation].parameters) {
      for (var i = 0; i < swagger["paths"][path][operation].parameters.length; i++) {
        if (swagger["paths"][path][operation].parameters[i].in && swagger["paths"][path][operation].parameters[i].in == 'body' && swagger["paths"][path][operation].parameters[i].schema) {

          var jsonData = generate(swagger["paths"][path][operation].parameters[i].schema);

          if (!config.inputTesting) config.inputTesting = {};
          if (!config.inputTesting[path]) config.inputTesting[path] = {};
          if (!config.inputTesting[path][operation]) config.inputTesting[path][operation] = {};
          if (!config.inputTesting[path][operation]["200"]) config.inputTesting[path][operation]["200"] = [];
          if (!config.inputTesting[path][operation]["400"]) config.inputTesting[path][operation]["400"] = [];

          for (var j = 0; j < jsonData.length; j++) {
            if (jsonData[j].valid == true) {
              config.inputTesting[path][operation]["200"].push({"body": jsonData[j]});
            } else {
              config.inputTesting[path][operation]["400"].push({"body": jsonData[j]});
            }
          }
        }
      }
    }
    }

    setTimeout(function(){
      looper(x+1, completeFunc);
    }, 0);
  } else {
    completeFunc();
  }
}

looper(0, function(){
  // Generates an array of JavaScript test files following specified configuration
  var tests = stt.testGen(swagger, config);
  console.log("Creating "+tests.length+" tests...")
  for (var i = 0; i < tests.length; i++){
    fs.writeFileSync("./tests/"+tests[i].name, tests[i].test, 'utf8');
  }
})

Remco75 commented 7 years ago

Just another thought, would be cool to use the example property ( if present ) to fill the mock data.

This snippet is derived from the pet-schema: "name": { "type": "string", "example": "doggie" }

Carefull documentation of your API would then lead to comprehensive mocks!

apigee-127 / swagger-test-templates

mock test data generation #115