Fail when new snapshots are created in CI

novemberborn commented 6 years ago

When running tests in CI environments, if new snapshots are created it implies the tests are behaving differently then during development. This can hide bugs. The snapshot assertions should fail, rather than silently creating the snapshot and passing.

@avajs/core and others, what do you think?

impaler commented 6 years ago

It seems like a reasonable requirement to prevent false positives. I think an error message for this would need to describe the reasoning and fix, something along the lines of:

"Snapshot {x} is missing a fixture file. To prevent false posititives, a verified fixture file is required when testing in the CI environment."

novemberborn commented 6 years ago

How about:

Could not find snapshot. Note that snapshots are not created automatically when in a CI environment.

(This would be shown as the message for the t.snapshot() assertion failure.)

sindresorhus commented 6 years ago

👍

AJamesPhillips commented 6 years ago

Thanks all for the excellent contributions to date, much appreciated! Any further thoughts on this? ~I wasn't clear on how Ava would tell it was running in a CI environment.~ You use is-ci which uses https://github.com/watson/ci-info/blob/master/index.js Got it. ~Perhaps this behaviour of erring if no snapshot was found could be enabled by default, and to generate the new snapshot files you would have to pass the -u flag? This would be a breaking change and would not score great on the "self documenting options" but I can't think of anything better right this moment. Anyone considering better ideas?~

novemberborn commented 6 years ago

@AJamesPhillips would you be willing to help out with this?

AJamesPhillips commented 6 years ago

Hi @novemberborn , I had a quick look at the code. I would prefer to add a flag which if set, will fail the tests if a snapshot can not be located, as opposed to only failing if on a ci environment and a snapshot can not be located. I just had a case now where our tests passed when they should have failed. In this case they were run locally inside a docker container. This would not be caught by the is-ci check proposed. Any thoughts on this approach (they're not mutually exclusive so we could do both at some point)? Thanks.

novemberborn commented 6 years ago

In this case they were run locally inside a docker container.

If you run tests in a Docker container, how do you update the snapshots?

The idea is that during development you'd notice the new snapshot files (since you need to commit them). I think if we implement the CI flag then you could set CI=1 in your container and AVA will treat it as such. Would that be sufficient?

AJamesPhillips commented 6 years ago

Yes apologies that wasn't clear @novemberborn . Locally we normally run and update our tests outside of docker. On our jenkins ci server we build a docker image and run the ava tests inside the container. In this case I was running the ci build and tests locally to check it was working correctly. It wasn't because the build step did not copy the snapshots to the right file name and location for ava to see them, so ava then generated new snapshots which would have masked a subsequent real error.

if we implement the CI flag then you could set CI=1 in your container

That would work, though setting CI=1 when running a container locally is strange. Not saying I object, just saying I think it wouldn't be obviously for others. In our case it would be reasonable as it would sit in the script file we call on the ci server to setup, build and run the tests in the container. So setting CI=1 inside a file like test-ci.sh would not be unreasonable.

novemberborn commented 6 years ago

In our case it would be reasonable as it would sit in the script file we call on the ci server to setup, build and run the tests in the container.

Arguably you'd want to forward env variables from Jenkins to your container, so I think it's reasonably that you'd need to do the same when testing your container setup outside of Jenkins. I admit it's not immediately obvious, but that also goes for having a configuration option.

billyjanitsch commented 6 years ago

I would love this as a flag as well, and IMO it should be enabled by default in all environments -- not just CI. I've never liked the behavior of auto-creating new snapshots without -u: I imagine most users think of running ava as a pure function which returns a boolean describing whether their tests passed or failed. So it's odd for ava to have side effects by default, especially those which modify VCS-tracked files.

Pragmatically, it complicates scenarios like running a test suite as a commit/push/prepublish hook. I've often successfully pushed commits (relying on my git hook to fail if the tests do anything other than succeed) only to realize that the test suite passed but dirtied the repo, so my (incorrect) commit shouldn't have gone through.

novemberborn commented 6 years ago

Pragmatically, it complicates scenarios like running a test suite as a commit/push/prepublish hook. I've often successfully pushed commits (relying on my git hook to fail if the tests do anything other than succeed) only to realize that the test suite passed but dirtied the repo, so my (incorrect) commit shouldn't have gone through.

Oh that's a good point, I wonder if we could detect that, too?

I hear what you're saying @billyjanitsch, though I'm reticent about having a flag to disable this behavior. Instead, what if we'd write snapshot updates to a temporary location, exit with an instructive failure and exit code of 1, and then have a npx ava --commit-snapshots command (pending bike-shedding) to write them into your source tree?

billyjanitsch commented 6 years ago

exit with an instructive failure and exit code of 1

+1 -- this is all I really want (although the rest of your idea sounds great too 😄)

I am a bit limited on OSS time at the moment but when I free up I can take a look!

novemberborn commented 5 years ago

what if we'd write snapshot updates to a temporary location, exit with an instructive failure and exit code of 1, and then have a npx ava --commit-snapshots command (pending bike-shedding) to write them into your source tree?

I've opened https://github.com/avajs/ava/issues/2099 to track this.

sramam commented 4 years ago

I needed a workaround for this feature.

The good news: Took a bit of digging around, but current implementation (as of ~v3.8) actually provides the ideal solution (for my use-case!).

Use environment variable - AVA_FORCE_CI. "ci" and "not-ci" seem to be values supported.

The rest of this note documents my use-case. Hopefully it'll sway the maintainers should this behavior ever be reconsidered.

The use case

I'm building a code generator, which generates some boiler-plate, which is then managed by my users. The generated boiler-plate seeks to encourage good practices, and generates test cases. These test cases rely on snapshots.

I'd like to ensure that code-generation workflow is tested, especially cross-platform. The generator implements a simple (if time-consuming) test-case:

Start with a clean slate (a tmp dir)
generate boiler-plate (a new npm module is generated)
npm install
npm run compile
npm run test

The problem

That last step fails on the CI server. To be clear, step 5 is running code on the generated code, which uses snapshots. Ideally, I should not provide pre-generated snapshots.

Ideal solution

I'd like an environment variable to control this behavior, not a flag. This way the boiler plate generator does not have to do different things for CI tests vs regular use.

AVA_FORCE_CI fits the bill perfectly!

novemberborn commented 4 years ago

To be clear, step 5 is running code on the generated code, which uses snapshots. Ideally, I should not provide pre-generated snapshots.

Could you elaborate why you need t.snapshot() at all?

AVA_FORCE_CI fits the bill perfectly!

That's a private variable. Don't use it.

sramam commented 4 years ago

I like to write my functions like so:

function usefulThing(...allInputs) {
  // actually do useful things with the input(s)
  return allOutputs;
}

A very convenient way to test these, is

test("test1", (t) => {
  const result = usefulThing(sample_input_A);
  t.snapshot(result)
}
// and so on for other sample_inputs.

While this looks like the functions are "pure" (meaning on side-effects), in reality the functions are interfacing with external APIs with 100s of lines of output (and state, but that's not important here).

It's very tedious to build exact matchers for such output. Moreover, I don't control the external APIs and cannot make presumptions on API change management process. Snapshots guard me against any changes - well managed or not.

When a test case fails, I only need to reason about whether the change is impacting my code or not. If not, I just update snapshots and move on.

The whole setup is a productivity booster - not to mention a worry minimizer.

Much of my use of ava is built on its snapshot management - both the ease of use and the binary storage (what can I say - I like strong guarantees!). Over time I almost don't use any of the other assertions. The occasional t.is() or t.throws().

That's a private variable. Don't use it.

I realize. This note was meant provide a good use-case and a nudge for why it deserves better treatment.

I can't claim my code generator is a meta-program, however it is still meta in the sense that it generates code.ava and its snapshots have made building it much much easier.

Given the investment I've made in building my testing strategy around ava, I panicked on realizing the root cause of my problems. It'd be a deal breaker for me.

What is curious is that the release notes mention this being introduced in v2.0.0. I was however running it on github actions with v3.0.0-beta2. Perhaps it was the detection of gh-actions that has changed.

Please do consider more formal treatment of AVA_FORCE_CI or provide some equivalent functionality.

novemberborn commented 4 years ago

For snapshot testing to work, you need to compare against a known-good snapshot. That should involve a human looking at the generated snapshot report, ideally during code review.

If you generate a snapshot during CI then that does not happen, and you won't detect unintended changes.

It's not clear to me yet why you need to create new snapshots in CI. Where is your known-good snapshot coming from?

Perhaps it was the detection of gh-actions that has changed.

Yes I reckon so.

sramam commented 4 years ago

Perhaps my speaking in abstract terms is causing the point to get muddled.

My program is very similar to create-react-app or yeoman like programs: It's a program that generates a program.

To set the stage, there are three "programs" that we'll refer to:

template-program,
meta-program and
generated-program.

metaProgram(userInput, templateProgram) => generatedProgram

That is to say: the meta-program takes user-input and customizes the template-program, to yield the generated-program.

The template-program

has test cases that use snapshots.
is responsible to ensure that tests provide necessary and sufficient correctness guarantees.

The generated-program

userInput driven customizations preclude pre-generating snapshots.
first test-run generates snapshots (typically this is done in the user context)

The meta-program

needs to ensure that customizations applied do not break the generated program.
The meta-program test that is conflicting withava behavior under discussion:
- Step 1. simulate user-input
- Step 2. generate program (This is the actual function under test)
- Step 3. build the generated program
- Step 4. run tests on the generated program

Step 2 prevents the snapshots from being "generated". Steps 3 & 4 succeeding is the success criteria of the test.

In the CI environment step 4 fails, since it lacks previous snapshots. Which is the problem I am trying to solve.

This sounds complicated, but is simpler than it looks. Importantly, once it works, is extremely empowering to making changes all around.

Hopefully this explains it sufficiently. Please let me know if it's still not clear.

I completely understand the underlying intent of preventing snapshots from being built in the CI environment. While not a vanilla use of ava, my use-case is very legit too and deserves the override it seeks! :grin:

novemberborn commented 4 years ago

So the generated program ends up generating snapshots? How do you ensure that they're "good"?

I completely understand the underlying intent of preventing snapshots from being built in the CI environment. While not a vanilla use of ava, my use-case is very legit too and deserves the override it seeks!

I am very interested in exposing low-level building blocks so you can build more powerful testing solutions. Just need to figure out what the right blocks are 😄

sramam commented 4 years ago

So the generated program ends up generating snapshots?

Exactly! The meta-program test generates it on the CI system, verifies that it still compiles+executes and then discards it.

How do you ensure that they're "good"?

The template-program is responsible for ensuring that the snapshots are sufficient. Remember this is all boiler-plate - so compile and execute is sufficient "goodness" criteria.

The customizations do not involve complex logic - just generating the right artifacts, with the right names, that are wired up together.

The compilation tests the renaming was correct. The (test) execution ensures that all the wiring up is correct.

Most of the generated-code is a no-op scaffolding.

I am very interested in exposing low-level building blocks so you can build more powerful testing solutions. Just need to figure out what the right blocks are 😄

👍

novemberborn commented 4 years ago

Okay I'm still confused what code generates the snapshots.

It sounds like maybe a "write snapshot without asserting" concept would do the trick, but I also wonder if you can't emit something you can use with t.deepEqual().

sramam commented 4 years ago

Okay I'm still confused what code generates the snapshots.

This is the simplest I can boil down the code to. Hopefully it'll help:

template-program.js


const name = "tname";
function usefulFunction() {
    // const result = await fetch("3rd-party-api", ...);
    return { 
      name,
      // ...result
   };
}

// and the associated test
test("usefulFunction", (t) => {
    t.snapshot(usefulFunction());
});

meta-program.js

function customize(userInput, outputDir) {
    const template = fs.readFileSync("template-program.js", "utf8");
    const generated = template.replace(/tname/g, userInput.name);
    fs.writeFileSync(`${outputDir}/generated-program.js`, generated, "utf8");
}

// and the associated test

/**********************************************
 * This is the test case I am advocating for! *
 **********************************************/
test("meta-program", (t) => {
    const userInput = { 
        name: "customized"
    };
    const outputDir = "/tmp/generated-repo";
    customize(userInput, outputDir);
    execa("npm", [ "install" ], { cwd: outputDir});
    execa("npm", [ "run", "compile"], { cwd: outputDir});

   /*******************************
    * The next line in particular *
    *******************************/
    execa("npm", [ "run", "test"], { cwd: outputDir}); 
    t.pass();
})

generated-program.js

const name = "customized";
function usefulFunction() {
    // const result = await fetch("3rd-party-api", ...);
    return { 
      name,
      // ...result
   };
}

// and the associated test
test("usefulFunction", (t) => {
    t.snapshot(usefulFunction());
});

It sounds like maybe a "write snapshot without asserting" concept would do the trick, but I also wonder if you can't emit something you can use with t.deepEqual().

You are essentially asking for a defense of my testing strategy and the use of snapshots! I'm game, but will have to be careful - it can slip out of scope very quickly.

While the simple boilerplate above only hints at it, the "running in production" scenario involves result structures with 100s of properties and sub-structures. Not to mention 100s of API endpoints.

Moreover, I don't control these structures. So what I really need to know is

what did it look like last time I made modifications?
Did it change?
How?

Recreating an expected object or cherry picking things to assert is a nightmare.

As a general rule, since discovering snapshots, I try (very hard!) to avoid t.deepEqual(). It requires me to maintain the "expectation" manually, when the snapshot does it automatically.

This testing strategy for 3rd party interfacing uses a few other tricks I am glossing over, but those are not pertinent here. It probably deserves a blog post, but I need to ship first...

I have noticed in other threads that you don't see snapshots as the crown-jewel of testing. This is not the right forum, but I'd love to debate you on that ... 😉 mis-conception 😉 !

sramam commented 4 years ago

Also, the template program was trivialized for this presentation, but it's in the level of complexity of a Rails/Django model+controller in reality. It has enough substance to warrant this approach.

novemberborn commented 4 years ago

You are essentially asking for a defense of my testing strategy and the use of snapshots! I'm game, but will have to be careful - it can slip out of scope very quickly.

I'm trying to understand your use case, to see how AVA can help not just you, but other use cases like yours.

So what I really need to know is

what did it look like last time I made modifications?

Did it change?

How?

How do you determine this if the snapshot is created by the generated program in CI?

I have noticed in other threads that you don't see snapshots as the crown-jewel of testing.

You seem to like AVA's snapshots. Who do you think spent months on that implementation? 😉

sramam commented 4 years ago

So what I really need to know is

what did it look like last time I made modifications? Did it change? How?

How do you determine this if the snapshot is created by the generated program in CI?

When building the template-program, I am manually validating that the snapshot is correct. Both the template-program and generated-program (at t=0) are scaffolding - a no-op for practical purposes.

The generator of the scaffolding (the meta-program), does not generate logic - just renames variable-names (regexp/replace), and sets import paths and such.

The logic of the generated program is untouched, so can be presumed to be correct.

Which leads to the important corollary: I don't have to manually validate the generated snapshots are correct at this point.

It's presumed. If the test runs, the snapshot are guaranteed correct.

The question then is - do I need to run the tests? Both the compile and test-executions on the generated code are still valuable. They ensure:

the template-program did not introduce new capability that needs to be customized.
the wiring needs have not changed.
the underlying dependencies are not interacting adversely with some generation logic.

The last one is really corner case and surprised me, but I ran into it, of all things, with generating package.json and ava@2.x -> ava@3.x upgrade. I was using a cached version of ava config, and that caused problems - (sources was being generated from a cached version in the meta program).

Basically, that empty run of the test cycle has proven its value to me - until it fails, I can live worry free. Initially, some corner cases slide past this strategy, but a little run time and it catches most if not all of them.

The real value of the test cases is to provide the user of the generated-code with a convention of how test cases need to be written. But that is completely outside the scope of this discussion.

Reading through, I missed addressing this comment of yours:

maybe a "write snapshot without asserting" concept would do the trick

It I am understanding what you mean correctly, actually not.

I do not want to fiddle with the generated code to execute it or it's tests when executed in the context of the meta-program testing. The environment variable is the ideal solution.

The constraint you have in place is exactly right for all programs that do something, Which is why you have been consistent in asking "who validates the snapshot".

The only case I think where it does not apply is when a program generates another program.

Typically programs that are generated by programs (at least circa 2020), don't do very sophisticated things. They need humans to adapt them to have any utility.

Even in the example presented, the scaffolding does not generate the 3rd party API calls. The human receiving the generated-program adapts it to do so.

However, the meta-program still needs an assurance that it generated a program that will compile and run. Which only requires an execution of the tests - at least it's the simplest.

For all this discussion, it's the ideal feature request - don't change a thing!

Especially do not change the coupling between AVA_FORCE_CI="not_ci" and its use in preventing assertion on generating snapshots on CI. At best document it for clarity.

You seem to like AVA's snapshots. Who do you think spent months on that implementation? 😉

touché! Thank you for that effort. It's so elegant and tamper-proof for normal usage that I can't imagine developing code without it anymore.

I would still love the debate though - this is not the first time I've seen you advocate for t.deepEqual() in preference over t.snapshot(). IMHO the advocacy should be inverted in direction!

novemberborn commented 4 years ago

When building the template-program, I am manually validating that the snapshot is correct.

[…]

I don't have to manually validate the generated snapshots are correct at this point.

It's presumed. If the test runs, the snapshot are guaranteed correct.

Right. So when you run the test suite for the template program, you check snapshots. And then you run the test suite for the generated program and you're basically making sure there's no exceptions.

I think the best approach for your use case would be to wrap the t.snapshot() call:

const snapshotOrPass = (t, value) => {
  if (testingTemplateProgram) {
    t.snapshot(value)
  } else {
    t.pass()
  }
}

And then use snapshotOrPass(t, result) in the tests.

It's a little awkward (at least until you can install custom assertions on t). But, you can control how the function determines what to do. Either through an environment variable or by passing an extra argument (npx ava -- --generated). The latter becomes available in process.argv inside the test processes.

Heck, you could rewrite the test files in the generation process to replace the t.snapshot() call with a t.pass().

sramam commented 4 years ago

Currently, using AVA_FORCE_CI works just fine without any more code and is really elegant. It's easy to explain, easy to understand and easy to reason about.

The solution being proposed will have none of these attributes. Not to mention writing a boatload of code which no one really cares about.

Not sure I understand your strong objection to treating AVA_FORCE_CI as part of the public interface.

It's clear you use this for more than treating snapshots differently in the CI system. Further, the design and use of the variable looks very stable. I am not seeing why it would need dramatic change. On the outside chance that you do need to modify behavior, my request to expose this as a public interface will result in some community discussion.

FWIW, I did find a similar problem with husky, and the solution there too is to provide an environment variable. As we have discussed previously, this issue is also rooted in a generator program being tested, on CI servers!

novemberborn commented 4 years ago

Not sure I understand your strong objection to treating AVA_FORCE_CI as part of the public interface.

It exists so that AVA's own test suite can run reliably in Node.js' CIGTM environment. It may change so that its internal use is more clear.

Nothing's stopping you from using it, of course.

I think we've exhausted this discussion.

sramam commented 4 years ago

Thank you for indulging me.

I'll keep a look out for release notes in the future, but if you do decide to change its behavior, I'd appreciate a heads up to plan for the change.

avajs / ava