The Testing Ecosystem is Frustrating on its Best Day

I'm either dumb (likely) or our current practices with testing and the entire ecosystem is pretty broken. I've always wanted apps I work on to be really well tested, but it feels way harder than it should be.

Why Test?

First, why should we write tests? What's the point? What do we get out of it?

There is a two-part answer that addresses all of those questions:

We can sleep better at night.
- Knowing our users don't have to fight with our app.
- Knowing our laziness isn't causing real frustration, stress, and pain.
- We're not introducing more cortisol to the world.
- We're not making the world a worse place (when people are stressed they take it out on everything around them).
We don't cause regressions.
- When we (or someone on the team) changes something, it doesn't cause a different page in the app to break or display the wrong data.
- We don't have to manually QA every single possible state of our app over-and-over-and-over again.

That's it. We want to do our best to make sure our users aren't fighting with our app, and we don't want to have to check every single little thing every time someone opens a Pull Request.

With those two goals in mind, let's proceed with some of my gripes before getting to a solution, then get back to more griping.

Stop Mocking

Let's get this over with because it's one of my biggest pet peeves in the testing community.

Mocks are one of the stupidest concepts I've ever tried to wrap my head around—and I have tried for almost a decade, asking many professionals, because I thought I was simply missing something, but it turns out I was right and mocking really is just dumb for the most part.

The idea behind mocks goes something like...

Let's test something, but instead of using the function or data that the real app would use, let's just completely make something up that way it is faster and easier to set up.

How does this actually confirm whether a user is experiencing your app the way it was intended?

Think about it, you want to test whether a user can create an account on your website, so you begin writing a test for it and realize you have to hit your API to actually create the account in the database.

Well, you don't want to actually create the account in the database, so instead, you just pretend the API did what is was supposed to do, and everything works as expected.

Sweet. Your tests pass and everyone is happy until thousands of users can't make an account. How would you know your API had changed? How would you know the frontend needed to change to reflect the API changes? You never actually talked to your API. Your tests always just played around with hard-coded JSON in a make-believe fantasy world. 👇

Are mocks easy to setup and use? Yes. Are you purposefully tricking yourself and every company stakeholder into thinking your app is well-tested? Yes. This isn't a good trade-off.

So what's the alternative? How do you actually test forms and APIs and stuff without having to actually inject junk into the database?

Set up a Proper Test Environment

There's no way around this. If you actually want to test your app in a confident manner without dangerously cluttering your production db with fake data from your tests, you have to set up a test environment.

This takes a bit of setup, but once you nail the pattern, you can create a custom test env for any app. It mostly revolves around replicating your database and creating pretty thorough seed files with plenty of real-ish data.

You want to work with as much real-ish data as possible. You can create a script to:

Delete the old test db.
Get a SQL dump from production.
Replace PII with fake PII.
Import that dump into a fresh db.
Run this as needed (preferably daily through a CRON job until development on the app slows down) and commit it to repo to keep your team on the same page, and your test data fresh. Without fresh, production-ish data, how is our approach different from mocking data?

You can/should use this script for your staging env as well to keep it as up-to-date with real-ish data as possible too.

Warning Please be sure to have backups upon backups of your production db in case you screw something up and need to rollback. This shouldn't need to be said. If you find it difficult to setup a CRON job to get a production db dump and send it to S3, then at least make sure your db host are backing up your db (you will probably have to pay a bit for this service).

Now you can confidently create a test db, do whatever you want to it, then delete it. You can do this billions of times, locally or in CI, with or without considering a team, and do whatever you want to that db.

When is mocking okay?

Almost never. Stop going back to your evil mock ways.

"What if I have to communicate with a 3rd party API?"

This is going to hurt you because it's extra work, but just create a test account for that 3rd party API, add that API token to your .test-env file. And then create another script to export production data from the original account, fake PII, and import it into the test account.

Now that I think about it, we should only be exporting data (from a production db or a 3rd party API) via whitelisting records. PII can always be introduced to an API.

This might seem like overkill, and if you're only using the API for a couple payloads, then mocking is probably fine, but if your app heavily relies on specific 3rd party APIs, then the freedom to test against their actual endpoints, libraries, and SDKs will save you time, and quickly catch a lot of bugs, in the long run.

As an example, imagine your app sells stuff or takes donations and relies heavily on Stripe. You haven't upgraded the Stripe SDK dependency in your app in a while. Let's say you're on something like 2.0.3 (these are imaginary semver versions I'm picking); several months go by and your tests have been passing; someone hacks Stripe; Stripe fixes it quickly and releases the fix as 4.3.4.

How do you confidently upgrade to 4.3.4? You have 400 tests that were mocked with payloads you got from 2.0.3.

Do you go through every single test and compare its request and response to the 4.3.4 version, and tweak the mocks to match 4.3.4? You will almost certainly make a mistake that won't be caught until something happens enough times in production that a user finally speaks up. You've potentially done millions of dollars of damage to your company.

On the other hand, if you had a test env setup from the beginning, everyone on the team could've been upgrading willy-nilly the entire time. You're not messing with real money (Stripe also appreciates the concept of test environments so you just change your Stripe API secret key in your .test-env file et voila) so who cares if something breaks in your chore/upgrade-stripe branch?

Is it ever okay to mock?

Fine. These are the only three instances I can see mocking being useful.

Again, if you are hitting a 3rd party API for a couple small payloads of something really, truly, insignificant to your app, and you are coding defensively on the frontend to simply not show that data if that API broken.
If a 3rd party API is just way too expensive to justify creating an extra testing account for it.
If a 3rd party API doesn't have any concept of test envs; has a really bad API for exporting/importing data; or they don't adhere to semver very well. In all cases it's really hard to write high-quality export/import scripts for your test env. I would advise searching for an alternative service or even just recreating what they do for your own company's specific use case.

Let's stick a pin in the whining for a second to go over some testing approaches and why they fall short.

Testing a Component in React

Why mount a component at all? Why do this? Does knowing a component has x prop confirm the end-user is seeing a specific view the way it was meant to be rendered?

Pseudo example:

// some-page.tsx
const columns = ['Employee', 'Salary']
const cellData = [ /* ... */ ]

<Table
  columns={columns}
  cellData={cellData}
/>

// some-page.test.tsx
test('Column header should be Employee', () => {
  const columns = ['Employee', 'Salary']
  const mounted = mount(<Table columns={columns} />)
  expect(mounted).toBeVisible()
  expect(mounted.props.columns).toEqual(columns)
  expect(mounted.someSelectorFunc('th:first-of-type')).toHaveText('Employee')
  // ✅ Passes!
})

Sweet, it passed. Does this test, or any component test really, 100% confirm a user is able to see that Employee?

Does it render it correctly with the correct CSS? Can you look at a picture or load the state of the app at the exact moment the assertion was executed to confirm it looks correct?
Is it actually visible-visible? I've had plenty of false positives (and negatives) where simply checking visibility was super frustrating to check for, then flaky.
Does confirming the props are XYZ actually mean anything?
It's annoying we have to type so much code to simply test "If I look at this page, it should have a header that says Employee." And holy crap once you start actually clicking on things and waiting for elements to hide, pages to change, elements to transition into view, etc. it becomes a huge pain. Like, look at all the stuff you have to do to click on a button and see if it triggered some element to be modified: https://kentcdodds.com/blog/fix-the-not-wrapped-in-act-warning

So combos like Enzyme or Testing Library + Jest are pretty useless.

Kent C. Dodds seems very smart when it comes to testing, and probably the most famous person in the "testing JS" sphere (in fact he has a course called Testing Javascript), but even his testing library seems like it only provides a simpler way to mount a component then adds a couple shortcut functions that just abstract CSS selectors a smidge (shout-out to trying to make devs more a11y-friendly by promoting selectors like "findByText" and "getByRole").

Don't get me wrong Testing Library is 1000x easier/better than Enzyme, but it still is just looking at some JSDom, and working with that JSDom is incredibly frustrating.

All of this said, Jest is actually pretty amazing. It is super lightweight, fast, has a really nice watch mode, has a nice colorized CLI, all kinds of configuration options, etc.

Jest is just a great tool. Good job Jest.

E2E falls short

At first glance, E2E seems like a good idea:

Open a real browser.
Interact with a website.
Confirm things look a certain way.

Tools like Selenium, Nightwatch, TestCafe, Cypress, Playwright, etc. have been around for a very, very, long time. People have been using them to create all kinds of tests for a decade or so.

A few problems with these...

If you run these headed (i.e. you can see the browser pop up and click around) then they are slllloooowwww. It just takes the browser forever to start.

We still have to write code to click things, interact with forms, etc. but at least we don't have super weird/awkward stuff like act to fight with.

Honestly, Cypress handles state change like I would expect.

Look at this https://www.cypress.io/blog/2019/02/05/modern-frontend-testing-with-cypress#toggle-completed-state

That's exactly what I want. To be able to write a test like this:

test('When the Create Account button is clicked, it gets removed from the page, and a loading spinner gets added', () => {
    const btn = find('#create-account-btn')
    btn.click()
    expect(btn).not.toBeInTheDocument() // even if it animates out
    expect(find('#spinner')).toBeInTheDocument() // even if it animates in
})

Cypress even waits on animations and such.

I cannot emphasize how much I hate that act() warning so Cypress immediately beats everything.

Imagine trying to learn exactly when/how/where to use act() and all the gotchas, and then teaching your team that. It won't happen. It isn't worth it. Just use Cypress.

Downsides to Cypress

Cypress is slow. Run Cypress in headless mode via cypress run (instead of cypress open).

Cypress is also a real pain to get working in Docker (especially on M1 Chips). I half suspect it is on purpose. If their official Docker images worked, why would anyone purchase their Cypress Dashboard product? I wouldn't. So there is a lot of lock-in with Cypress.

Playwright was really close

Playwright looked like the best of all worlds:

E2E
Fast, lightweight
You can Dockerize it pretty easily.
Great API that waits on elements to animate and change state and such (like Cypress does).

Playwright doesn't work with Yarn 2's PnP. Its codebase is riddled with references to node_modules (which don't exist in Yarn 2 PnP) and like most OSS maintainers, no one is excited to upgrade their project to support a specific tool.

Playwright can't into Typescript paths: https://github.com/microsoft/playwright/issues/7121

Screenshot testing

I keep coming back to this idea that the answer to all my testing problems revolves around screenshots somehow. My reasoning is what if I forgot to test for a sidebar visibility in an E2E test suite? Would the E2E runner care? No. But with screenshots, yes it would. It would also test for the visibility of certain elements without me having to actually type that stuff out.

For instance, in any other test runner, I'd have to do something like:

test('Clicking the submit button shows a success message and removes the submit button then adds a new newsletter signup form', async () => {
    find('#btn').click()
    expect(find('#success_message')).isVisible()
    expect(await find('#btn')).not.toBeInTheDocument()
    expect(await find('#newsletter_form')).toBeInDocument()
    expect(await find('#newsletter_form_checkbox')).isVisible()
    expect(await find('#newsletter_form_checkbox').attr('checked')).isFalse()
    expect(await find('#newsletter_form_btn')).isVisible()
    expect(await find('#newsletter_form_btn').attr('disabled')).isFalse()
})

What if I could combine everything after the "Do some interaction" part, every assertion (and if you're writing really thorough tests, you need to write a lot), into a single assertion like so:

test('Clicking the submit button shows a success message and removes the submit button then adds a new newsletter signup form', (oldSnapshot) => {
    find('#btn').click()
    wait('500ms')
    const newScreenshot = screenshot()
    expect(oldSnapshot).isSameImage(newScreenshot)
})

👆 So much easier. Any dev can pick this up immediately. Multiple that by hundreds of tests in a project.

There are projects like:

Percy and Happo are expensive and lock-in. Chromatic is expensive and lock-in, and only works with Storybook (sidenote: I'm not a big fan of Storybook or Docz either, you can probably generate cleaner/more-helpful docs pretty easily from types, mocked 😱 data, and some loop fn).

jest-screenshot sounds very promising but it's unmaintained and I ran into another Yarn 2 problem https://github.com/Prior99/jest-screenshot/issues/83

If jest-screenshot is to be believed then jest-image-snapshot is slow. I also had some flaky tests with jest-image-snapshot.

Alas

Almost everything in the 2021 JS ecosystem feels off (with the exception of a few tools like NextJS which is so nice).

I don't think my ideal testing tool exists right now.

I want something simple and lightweight so I can plop it in Docker and introduce it to CI environments pretty easily.
I want something where the API automatically waits for elements to appear and times out after defaultMs = 1000 like Cypress, Playwright, and https://testing-library.com/docs/dom-testing-library/api-async#waitfor do.
I love Jest's colorful, clean, diffing, CLI tool and the various types of watching it can do (--watchAll, incremental, etc).
It can't reference node_modules and has to be Yarn 2+ friendly.
I want it to take screenshots using all the fastest libs (so it's "10x faster" than jest-image-snapshot) so I can ensure there aren't visual regressions (hugely important) but also just completely sidestep having to write a ton of ugly assertions.

I think it might be pretty straightforward to develop by forking jest-screenshot, or just creating something from scratch using some of the libraries jest-screenshot uses.

Monkey picture: https://unsplash.com/photos/nLXOatvTaLo
Candy world picture: https://www.behance.net/gallery/28128663/Candy-world-Playground-Disney

corysimmons / blog