Testing: Run Core unit tests

ockham commented 3 years ago

We're running Core's PHP unit tests against the GB plugin in our CI environment (on WordPress.com). When installing GB 9.2.1, one of those tests errored because of an issue. (A fix is underway.) That issue wasn't caught in Gutenberg's own tests, since we're not running Core's unit tests.

To catch these issues prior to a release, I propose we run Core's unit tests against GB inside of this repo -- e.g. as part of the unit tests GitHub action, so that it's run on every PR, and informs the author of any errors.

cc/ @sirreal @WunderBart @fullofcaffeine @nosolosw @gziolo

ockham commented 3 years ago

Living comment to collect some relevant information:

Core tests are run like this: https://github.com/WordPress/wordpress-develop/blob/a1403240bba2fb2efccc2bce70cad01fe1d5b6c2/package.json#L168

This command invokes this little script, which basically loads environment variables from the repo's .env file, and executes the phpunit command inside of the phpunit container: https://github.com/WordPress/wordpress-develop/blob/a1403240bba2fb2efccc2bce70cad01fe1d5b6c2/tools/local-env/scripts/docker.js#L6

The phpunit container is configured here: https://github.com/WordPress/wordpress-develop/blob/a1403240bba2fb2efccc2bce70cad01fe1d5b6c2/docker-compose.yml#L95-L125. It's based on the wordpressdevelop/phpunit Docker image.

That image's source is here: https://github.com/WordPress/wpdev-docker-images/blob/55b9dc8b2a1576e5712a38a396ec1cea2b30c960/templates/Dockerfile-phpunit.template

Gutenberg uses the same phpunit Docker container, but configures it somewhat differently: https://github.com/WordPress/gutenberg/blob/95188199c8b8045322d7f75a2666d47ea6504ad2/packages/env/lib/build-docker-compose-config.js#L167-L179

It seems like that the wordpressdevelop/phpunit Docker image doesn't contain the unit tests from wordpress-develop, but only phpunit; apparently, wordpress-develop mounts its unit tests into the Docker container to run them. Not sure if there is a Docker image that contains those. :thinking:

We might need to explore alternatives, such as cloning the wordpress-develop git repo into our Docker container in order to be able to run those tests.

desrosj commented 3 years ago

You are correct that the PHPUnit image does not contain the actual tests. The three primary containers that are set up are nginx:alpine, PHP, and MySQL. All containers are started on the same network, though. So each container has full access to anything within the other because they are networked and share a working directory.

If the goal is to run the WP Core tests with the GB tests, I'm happy to work on that. I'm currently working on converting Core's automated testing from Travis to GitHub Actions. Once the Core test suite is refined a bit more, the plan is to publish the steps as an action so that other projects can pull in Core and run those tests in their project.

But I'm not sure how easy it will be, as any Core test that expects a specific outcome will fail if the plugin alters the default behavior at all. We may need to add some tags to test classes to skip them when not running them in the context of default WordPress Core.

ockham commented 3 years ago

You are correct that the PHPUnit image does not contain the actual tests. The three primary containers that are set up are nginx:alpine, PHP, and MySQL. All containers are started on the same network, though. So each container has full access to anything within the other because they are networked and share a working directory.

If the goal is to run the WP Core tests with the GB tests, I'm happy to work on that. I'm currently working on converting Core's automated testing from Travis to GitHub Actions. Once the Core test suite is refined a bit more, the plan is to publish the steps as an action so that other projects can pull in Core and run those tests in their project.

Oh, that sounds really handy! Thanks for working on that :pray:

But I'm not sure how easy it will be, as any Core test that expects a specific outcome will fail if the plugin alters the default behavior at all. We may need to add some tags to test classes to skip them when not running them in the context of default WordPress Core.

I think that by default, Core tests should be required to pass for any GB PR; after all, GB can be installed as a plugin against the current stable Core version, so I think it's reasonable to assume that that version's tests pass. For the cases where Gutenberg does change Core's behavior so that tests fail (through a filter etc), we might need to add some mechanism to adapt for that, e.g. for skipping a test :thinking:

ockham commented 3 years ago

We just had another instance of an issue with GB 9.3 failing Core's unit tests (and likely hinting at an underlying issue) that probably could've been caught early by running Core's unit tests against every GB PR (as a GH action):

Core's test_get_custom_logo, test_has_custom_logo, and WP_Test_REST_Settings_Controller's test_get_items unit tests are failing for GB 9.3. (Currently debugging internally, I can share more details once I've tracked it down.)

This was probably caused by #26500, which now registers a few settings unconditionally that were previously only enabled if the Site Editor (FSE) experiment was enabled.

@desrosj Any news on your project? Is there any repo/PR that we can keep tabs on/contribute to? :blush:

We might want to bump priority for this issue, as having those unit tests should prevent a number of bugs in the future.

cc/ @youknowriad @mcsf @gziolo @WunderBart @sirreal

ockham commented 3 years ago

We just had another instance of an issue with GB 9.3 failing Core's unit tests (and likely hinting at an underlying issue) that probably could've been caught early by running Core's unit tests against every GB PR (as a GH action):

Core's test_get_custom_logo, test_has_custom_logo, and WP_Test_REST_Settings_Controller's test_get_items unit tests are failing for GB 9.3. (Currently debugging internally, I can share more details once I've tracked it down.)

Found #25173 which seems to describe this issue. Posted internal debugging notes at https://github.com/WordPress/gutenberg/issues/25173#issuecomment-724060904, so we can keep any communication directly related to that issue there, and keep this space here only for discussion about runnind Core unit tests in the GB repo.

ockham commented 3 years ago

But I'm not sure how easy it will be, as any Core test that expects a specific outcome will fail if the plugin alters the default behavior at all. We may need to add some tags to test classes to skip them when not running them in the context of default WordPress Core.

I think that by default, Core tests should be required to pass for any GB PR; after all, GB can be installed as a plugin against the current stable Core version, so I think it's reasonable to assume that that version's tests pass. For the cases where Gutenberg does change Core's behavior so that tests fail (through a filter etc), we might need to add some mechanism to adapt for that, e.g. for skipping a test

To follow up on this note, I think the biggest class of tests that typically need updating because of a GB change are the block snapshots (aka full-content tests). Those exist as part of Core's tests, and typically need updating when a block changes -- e.g. an attribute is added or removed. However, I think we can get away by running Gutenberg's own version of those tests (which will typically be updated as part of a PR that makes such a change to a block) rather than Core's. (Finally, those might actually count as e2e rather than unit tests, so they might not be part of the set of Core unit tests at all.)

mcsf commented 3 years ago

@ockham: I'm sorry you haven't gotten replies here yet!

On the surface, this is something that elicits a "yes, please" from me. However, once I think a bit about it, I hesitate.

On whom is the onus of integration? If the purpose of running Core tests on the plugin's master were to catch potential structural issues early and reduce the burden of merging Gutenberg in Core with each WP release cycle, I'd say sure. If the purpose is to prevent PR merges that might break a particular environment's test suite, I'm not so sure.

Are we conflating unit and integration testing? Can you tell us more about these integration failures? Looking at the instance you provided, the issue wasn't a mismatch of explicit test expectations ($this->assertEquals( $expected, $actual )), but a failure stemming from a runtime error (in this case, ArgumentCountError, but any other type errors apply) due to a plugin object replacing a core object (in this case, WP_Block_Type). Normally, unit tests would test a self-contained and nothing else; however, in a dynamic and weakly typed environment as PHP, what is happening is that we usurp the testing environment by loading core + plugin together and seeing if anything breaks. (As a concrete example, Core's test_dynamic_block_renders_string is meant to test Core's register_block_type and WP_Block_Type, but these are replaced by the Gutenberg.) In other systems, this would be done with actual integration testing or, at the very least, by verifying that there are no compile-time errors — thus catching type error and drawing attention to API changes. Are other failures that you have experienced alike?

I ask these question to make sure we're considering the full picture. If this is more of a question of specific integration issues, then it might make sense to devise specific integration tests. Furthermore, answering the question of the onus will help us determine where and when they should run.

ockham commented 3 years ago

@ockham: I'm sorry you haven't gotten replies here yet!

No worries -- makes me appreciate yours all the more! :wink:

On the surface, this is something that elicits a "yes, please" from me. However, once I think a bit about it, I hesitate.

On whom is the onus of integration? If the purpose of running Core tests on the plugin's master were to catch potential structural issues early and reduce the burden of merging Gutenberg in Core with each WP release cycle, I'd say sure. If the purpose is to prevent PR merges that might break a particular environment's test suite, I'm not so sure.

I'm quite strongly leaning toward the latter. I'll elaborate below :slightly_smiling_face:

Are we conflating unit and integration testing? Can you tell us more about these integration failures? Looking at the instance you provided, the issue wasn't a mismatch of explicit test expectations ($this->assertEquals( $expected, $actual )), but a failure stemming from a runtime error (in this case, ArgumentCountError, but any other type errors apply) due to a plugin object replacing a core object (in this case, WP_Block_Type). Normally, unit tests would test a self-contained and nothing else; however, in a dynamic and weakly typed environment as PHP, what is happening is that we usurp the testing environment by loading core + plugin together and seeing if anything breaks. (As a concrete example, Core's test_dynamic_block_renders_string is meant to test Core's register_block_type and WP_Block_Type, but these are replaced by the Gutenberg.) In other systems, this would be done with actual integration testing or, at the very least, by verifying that there are no compile-time errors — thus catching type error and drawing attention to API changes. Are other failures that you have experienced alike?

Yes! My thinking on this has lately been influenced even more by a more recent example. The tl;dr is

Core's test_get_custom_logo, test_has_custom_logo, and WP_Test_REST_Settings_Controller's test_get_items unit tests are failing for GB 9.3. [...]

This was probably caused by #26500, which now registers a few settings unconditionally that were previously only enabled if the Site Editor (FSE) experiment was enabled.

It later turned out that this had some actual user-visible impact, and had been reported as a bug already: #25173.

To me, this seems like a rather compelling case: We ended up with a bug in the 9.3 release that we could've avoided, had we run Core's tests against individual PRs, as it would've alerted the PR author about a problem that they were unaware of, and it would've given them the opportunity to fix that issue.

I ask these question to make sure we're considering the full picture. If this is more of a question of specific integration issues, then it might make sense to devise specific integration tests. Furthermore, answering the question of the onus will help us determine where and when they should run.

And I appreciate these questions! While I've only recently started collecting these bugs, we've had a number of similar instances before I filed this issue. The practical problem with buggy GB releases is that they create a bottleneck in one or two specific places: Such bugs will typically surface

either at Core merge time, and will have to be addressed by the people working on the merge (or potentially be delegated to the original author); or even after a Core release, in which largely the same applies;
or earlier, when someone attempts to run the Gutenberg plugin in a development/testing or production environment. In this case, we can expect PRs to be made against the repo, removing some burden from the Core merge.

In both cases, people finding the bug might not be too familiar with its context, and will have to spend some time getting acknowledged with the 'surrounding' code: this is the bottleneck I'm talking about. This can take up significant amounts of the people in charge of Core merges, or plugin deployment. Delegating to original authors isn't always an option, especially in the second setting: If a team tries to keep up with the 2-week release cadence of Gutenberg, the original authors might not have enough time or resources to spend in order to guarantee that the bug will be fixed in a timely manner.

Given that we have tests at our disposition to catch a whole class of those issues much earlier -- at the time a PR is created or pushed to -- I think it'd be great to actually leverage that potential. I agree that there are some technicalities to be worked out, as you've mentioned -- How do we deal with those tests if we're deliberately changing behavior compared to the tests' expectations? -- and I've tried to touch upon that in a few of my earlier comments on this issue. (As you said, e.g. if a Core test is overridden by a Gutenberg counterpart -- I think the rule of thumb should be to run the GB one and skip the Core one.) But overall, I think that this is something we should at least give a try, and adapt iteratively as we learn where and how we have to modify that testing behavior.

youknowriad commented 3 years ago

My own impression is that are very rare cases where these tests will break (maybe once per major release) and maybe not worth the allocated time and resources to run another job in our CI. That said, I'm open to trying and monitoring with the possibility to remove them if they prove useless or just blocking folks for nothing.

mcsf commented 3 years ago

My own impression is that are very rare cases where these tests will break (maybe once per major release) and maybe not worth the allocated time and resources to run another job in our CI. That said, I'm open to trying and monitoring with the possibility to remove them if they prove useless or just blocking folks for nothing.

Should there be release-bound tests? I ask this because I too was thinking about how the cost (both in computational and human impact) of running Core tests on every Gutenberg item may be too high, and I don't know what other CI options we have at our disposal in GitHub.

So, in alternative, maybe we could add a step to ./bin/plugin/cli.js rc|stable consisting of running a specific test suite. Perhaps we could have a shortcuts like ./bin/plugin/cli.js test|dry-run to run the tests without starting a release. Is this worth pursuing at all? My idea is to find a balance between imposing these tests on everyone vs. no one — maybe this too hidden?

jordesign commented 1 year ago

@ockham I'm just looking back through some older issues - wanted to check if the tests that have been added over time resolve this request, or if it is still relevant?

ockham commented 1 year ago

@jordesign Still relevant, I'm afraid. The issue isn't so much about adding tests but rather running Core's test suite automatically in the GB repo to make sure that GB's code (i.e. especially the PHP) doesn't break Core's tests.

jordesign commented 1 year ago

No worries - thanks @ockham

WordPress / gutenberg

Testing: Run Core unit tests #26418