WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.09k stars 4.04k forks source link

E2E: Eliminate side effects via DB snapshots #61719

Open WunderBart opened 2 months ago

WunderBart commented 2 months ago

What problem does this address?

Improves the individual E2E test isolation to the point where side effects are no longer happening.

What is your proposed solution?

This can be achieved by exporting the database after the global setup is complete and importing it for each test within the page fixture. An example implementation can be found in WooCommerce Blocks, introduced in https://github.com/woocommerce/woocommerce/pull/46125. It's using wp db API to manage the DB snapshots, which has been working reliably, taking an acceptable ~1s for the DB import step.

As rightfully pointed out by @youknowriad here, though, we might not want to use the wp CLI directly as it would mean we can't run the tests against any URL. However, working around it with a dedicated REST API call might be possible. Another notable limitation of this approach is that using hooks other than beforeEach would need to be eslint-banned (via the no-hooks rule) as they either won't have an effect (afterEach, afterAll) or will work for the first test of the current suite only (beforeAll). IMO though, the latter isn't really that much of a limitation since it simplifies the test writing flow.

While I know how to import/export the DB via the wp CLI directly, I'm not sure how to approach it with a REST call. Any ideas? πŸ˜„

cc: @Mamaduka @kevin940726 @youknowriad @oandregal @gigitux @danielwrobert @nerrad

kevin940726 commented 1 month ago

This has been on my mind since we introduced Playwright originally. The ultimate goal is to enable the fullyParallel config and speed up the tests. DB snapshots might be a good way to achieve this, and I personally would love to see experiments around that!

as they either won't have an effect (afterEach, afterAll) or will work for the first test of the current suite only

I don't think we have to eslint-ban these hooks. They are still useful for cleaning up some stuff or for the serial mode.

we might not want to use the wp CLI directly as it would mean we can't run the tests against any URL.

I personally don't see much value in enabling every test to run for any given URL. We should empower our testing framework to do that. Tests shouldn't need to run multiple times to confirm the same thing. We can do this for Core or dotCom but not for every URL. We can definitely add some smoke tests or smaller sets of e2e tests for arbitrary URLs, but that should be done sparingly.

While I know how to import/export the DB via the wp CLI directly, I'm not sure how to approach it with a REST call. Any ideas? πŸ˜„

I'm sure there's a way but I don't know either without trying πŸ˜†.

Mamaduka commented 1 month ago

I agree that banning before and after hooks might not work here. Many tests rely on plugins to test specific scenarios, and we currently use these hooks for activation/deactivation.

cc @swissspidy, @desrosj

swissspidy commented 1 month ago

Improves the individual E2E test isolation to the point where side effects are no longer happening.

Is this currently a huge problem?

In my experience, a solid global setup handler for preparing the environment has always been enough and I didn't need any DB snapshots or anything. Of course for things like visual snapshots such snapshots will lead to more stable results. But then you have portability issues like @youknowriad mentioned.

This has been on my mind since we introduced Playwright originally. The ultimate goal is to enable the fullyParallel config and speed up the tests. DB snapshots might be a good way to achieve this, and I personally would love to see experiments around that!

I don't think snapshots would help here, probably the opposite as every test completely resets the database while another test is running, causing all sorts of side effects and errors.

For truly parallel tests that don't get into the way of each other, each test would need to run against a separate installation (separate URLs, table prefixes, etc.).

So if parallel testing is the ultimate goal, I'd suggest thinking about how to solve that.

kevin940726 commented 1 month ago

For truly parallel tests that don't get into the way of each other, each test would need to run against a separate installation (separate URLs, table prefixes, etc.).

I'm imaging DB snapshots as a method but not the ultimate goal. Any potential method is worth trying! If we could quickly apply DB snapshots and spawn a new environment for each test, that would be pretty cool. WordPress Playground could potentially help here, if we could solve the spinning-up speed.

WunderBart commented 1 month ago

The ultimate goal is to enable the fullyParallel config and speed up the tests.

I don't think this will ever be possible in WP, tbh. We can contain side effects in sharded serial mode (for example, by resetting the database before each test), but I don't think we'll ever be able to do that in the (fully) parallel mode. When tests run in parallel (on a single machine), they must only do read operations because when they start writing to the database, they will start affecting each other, which we won't be able to handle by the db reset or anything of the sort. Actually, resetting the db in fully parallel mode would be a side effect on its own. πŸ˜… I guess the only way to speed up the tests would be to increase the number of shards.... or separate the tests that only read from the database and run them in fully parallel mode, but that sounds difficult to maintain. Anyway, Playwright itself recommends using a single worker in CI to prioritize stability and reproducibility.

I don't think we have to eslint-ban these hooks. They are still useful for cleaning up some stuff or for the serial mode.

I agree that banning before and after hooks might not work here. Many tests rely on plugins to test specific scenarios, and we currently use these hooks for activation/deactivation.

If each test (not each suite!) starts with a clean database, the afterAll and afterEach hooks become obsolete because we no longer need to clean anything up – the db reset will do the cleaning. I'd also ban the beforeAll hook because (in the db reset scenario) it would only affect the first test in the block, as the rest would test against a clean db.

I personally don't see much value in enabling every test to run for any given URL. We should empower our testing framework to do that.

That would definitely make things easier, but wouldn't it make it impossible to run our tests against, e.g., Playground instances?

Improves the individual E2E test isolation to the point where side effects are no longer happening.

Is this currently a huge problem?

In general, it's a big problem when testing against WP (well, at least from my experience). For example, in Woo, we had so many tests written on top of side effects from other tests that we couldn't change the test running order (e.g., by increasing the shard count) because everything started collapsing. You could argue that we didn't do a great job setting up/tearing down the environment, but the reality is that achieving proper test isolation via setup/teardown manually, especially in complex scenarios, is error-prone and difficult to maintain. Without the need for the teardown step, writing tests is just easier and faster.

kevin940726 commented 1 month ago

I don't think this will ever be possible in WP, tbh.

I think nothing is impossible if we leverage the right tool (we have so many nowadays, service workers, URL rewrites, DB cloning, etc.) It just might not be worth the effort πŸ˜….

That would definitely make things easier, but wouldn't it make it impossible to run our tests against, e.g., Playground instances?

Depends on what we want to achieve. Playground is already a pretty unique environment that doesn't really count as a "website". We can definitely write some helpers to make testing Playground URLs easier though.

WunderBart commented 1 month ago

It just might not be worth the effort πŸ˜….

Precisely what I meant, just picked the wrong words! πŸ˜„