Rework ci and add e2e tests scaffolding

reimic commented 5 months ago

Summary

Rework CI with 2 dispatch workflows: automatic as a quick and thick quality gate for all pushes and pull requests and manual for a nice and easy way of geeting details on failing tests.

The first step toward achieving robust end-to-end (#108) coverage.

Major Changes

add a fast-failing broad automatic-dispatch run for pushes and merge requests
add a modular manual-dispatch run
add unit and e2e test suites to phpunit.xml.dist
add the defaultTestSuite ("all") attribute to phpunit.xml.dist
move the testdox flag from scripts to phpunit.xml.dist
add the E2ETestCase config for all e2e tests
add shell scripts to the README

The manual-dispatch allows for control over the branch it will run on, php version, os and test suite. Every default is set to all and the parameter FAIL_FAST is always set to false. This dispatches the run with a configuration that is broad and offers maximum info on the condition of the tested branch.

The automatic-disptach is more rigid and always tests all php versions, os and with all test suites. It also has the FAIL_FAST parameter set to true. This also introduces --stop-on-defect --fail-on-warning --fail-on-risky PHPUnit flags for enchanced quality control. Yet, this is a job-level feature. This means that the whole test matrix will always run (3 os * 7 php versions => 21 jobs). The benefit of the FAIL_FAST parameter is to run as few tests as possible to ascertain if a os-php version combination is stable.

I'd hope this will enable a workflow going more or less like that:

You work on a feature on your local fork.
After work is done you run tests locally with vendor/bin/phpunit, but unless you are really into PHP versions this will run on the single version you have configured as your default project interpreter C:\php\php-8.3.exe. Obviously, the local test run also only tests the code performance on your OS. For examples sake, let's say you are developting on a Mac and that all tests have passed.
You push your code and the automatic-disptach kicks in. GitHub quickly informs you that some of the jobs failed. Since all 21 jobs have run you have an instant and quite broad idea of what is going on. Let's assume your Mac-coding skills are flawless and all but the Windows jobs have passed.
You go the the manual-dispatch and pick your branch, windows-latest as the os to run on and decide to check all php versions. With an initial idea on the scope of your changes and data from the automatic runs you might have also conclued that only e2e tests actually failed. To save a few precious seconds you decide to only run the e2e suite.
You enjoy precise infos and helpfull stacktraces for failing Windows tests.

Example Usage

In the Actions tab, pick Manual Tests Dispatch.
Click the Run workflow button:
Customize your dispatch run with inputs:
- pick which branch to test:
- pick which PHP version to use (all is default and will use all versions)
- pick which OS to run on (all is default and will run on all os)
- pick which PHPUnit test suite to run (all is default and will run both unit and e2e tests)
Inspect final dispatch configuration:
Disptach tests.
Enjoy the autogenerated run name on GitHub:
Inspect results: (in the example only unit tests were selected, so e2e tests were skipped)

adamziel commented 5 months ago

How slow is the manual workflow? Couldn't it just run automatically? Typically everyone forgets about these manual workflows and they get stale and start failing soon.

reimic commented 5 months ago

Typically everyone forgets about these manual workflows and they get stale and start failing soon.

Couldn't it just run automatically?

I see why you would inquire about that. To clarify: it's not possible for them to get stale or desync in any way - they run the same test suites that are run automatically. This is not a different set of tests - it is a different way of running tests, with modularity in mind.

When considering their usefulness - think "dev tool" not quality gate. When pushing or PR-ing all test suites do run automatically.

How slow is the manual workflow?

On my machine e2e tests already take a while (with just 3/4 implemented). Mind that this is for a single php - os run. The containers on GitHub are a bit faster, since the download times are much better. (Probably due to mirrors and data transfer speed.) Also they do try to run in parallel, yet they start to queue up quite quickly. To speak beyond numbers - I have noticed the issue almost instantly. And this worried me to the extent that I've tackled it on two fronts - see "observations" here: #108. If not addressed it will hinder our goal to offer superb reliability. Since that requires tests for various complex blueprints with larger downloads.

To reiterate:

the manual workflow is a dev tool not a quality measure,
it can't get stale,
it lets you run all 21 combinations on your fork, at any time you wish, in a form fine-tuned for the issue you are facing and can't get info on locally,
it does not strain shared resources,
this is a PR that introduces the concept not a fully fleshed out version, I'll work on it and make it more powerful.

adamziel commented 5 months ago

Can you make it fail fast and also produce the full error output?

adamziel commented 5 months ago

Eg mark the step as failed to make the job red, and then, on failure, start a full test run that won't stop after the first failure

reimic commented 5 months ago

Eg mark the step as failed to make the job red, and then, on failure, start a full test run that won't stop after the first failure

Sure. GitHub steps produce a conclusion and an outcome that can be used in logical expressions within a job. (Side note: a workflow has jobs that may include steps) There is a faint difference between one and another and they might be helpful here. Workflows share this feature and can trigger workflows depending on their status:

on:
  workflow_run:
    workflows: [Build]
    types: [completed]

jobs:
  on-success:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    steps:
      - run: echo 'The triggering workflow passed'
  on-failure:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    steps:
      - run: echo 'The triggering workflow failed'

Help me see your point. What you are saying is that the initial automatic fast-failing workflow would trigger another more robust workflow. Which is fine and clearly doable, but the new jobs will compete over the same resource pool.

I could explore 2 strategies here:

The automatic workflow fails ultra fast (at first sign of problems to any step in any job) and then queues the robust workflow. This makes initial feedback rapid and also automatically provides robust info, but with a slight delay. I wonder if I could deprioritize it, to make all automatic workflows skip to the front of the queue.
The automatic workflow fails fast (jobs fails fast) and the queued robust workflow only runs for the php-os versions that reported issues.

Of course all this can notify interested parties and all that jazz. But that is beyond the point here.

I am on the fence here. With the current state of this PR, if an automatic workflow fails. You can decide if you care about further development and if you do - for the cheap price of 2 clicks - run the manual workflow on your own fork to see how your code performs for the php-os combination of your choosing.

I see how this could be improved upon, but my recommendation is as follows: Marge this as it is. I'll add several e2e tests next and once I am done and execution times are more apparent we'll circle back to this issue and evaluate how useful it has proven to be and how it could be enhanced.

adamziel commented 5 months ago

I mean the tests we have took 37s to finish, this fast/slow mode seems a but premature. Let's just ship the full version and scale back as needed.

WordPress / blueprints-library