gitpod-io / gitpod

The developer platform for on-demand cloud development environments to create software faster and more securely.
https://www.gitpod.io
GNU Affero General Public License v3.0
12.72k stars 1.22k forks source link

[tests] regressions for workspace components are hard to prevent #13598

Closed kylos101 closed 1 year ago

kylos101 commented 1 year ago

Is your feature request related to a problem? Please describe

We're seeing some PRs merged to main while integration tests are still failing for workspace components. This is expected, because tests are not blocking.

However, in hindsight and practice, non-blocking tests create pain. For example:

  1. It creates unexpected pressure as we get close to deploying when tests are failing (literally today).
  2. Additionally, the problem is compounded when new branches from main contain the same failing tests (also happened today, results in branch authors being unsure if they broke tests).

Describe the behaviour you'd like

Avoid failing tests in main by making passing integration tests a required check prior to allowing merge (similar to how a successful build and approval are required, and there cannot be holds). note: This check maybe should be limited to workspace tests only for now, I'm unsure of the state of tests for webapp and ide.

This will minimize pain for teams, as we tend to deploy the saas from main. Additionally, it'll save time for peers in product engineering, who may think they broke tests, when in actuality, they were broken in main on branch create.

Workspace tests take 33m to run on a regular vm, 20m on a large vm.

Describe alternatives you've considered

A flakey test, or false test failure could prevent a clean branch and PR from merging. So, we'd need some way to override.

Options:

  1. Ignore the flakey tests in a follow-on commit (a good option for flakey tests, assuming the code author can prove the test is infact flakey - the reviewer would be expected to challenge for flakiness, too).
  2. Another option could be usage of a certain label on the PR, and if present, bypass the check?
  3. Something else I haven't thought of

Additional context

meysholdt commented 1 year ago

This issue just got 5 upvotes in the prod eng retrospective. Let's address it early next week:

image
meysholdt commented 1 year ago

hey @kylos101, do you have examples for PRs that got merged even though the tests were failing? I would like to understand better why failing tests are not blocking.

kylos101 commented 1 year ago

Hi @meysholdt , failing tests are not blocking because humans are fallible (sometimes they forget to run tests, or check that tests pass). :grin: In other words, we don't have automation setup via a Github Check to mandate that if a workspace component changes, the tests must run and pass prior to merge. That's the intent of this issue.

Here is another data point, PR merges can be delayed, because humans are approving, adding a "hold", and then neglect to "unhold" in a timely manner.

So, there are a couple reasons to have test run automated, and added as a check that prevents merge:

  1. Function as a gate keeper bot, to automatically run tests when workspace components fail, preventing merge to main unless they pass, thus protecting main
  2. Function as a time saving bot, to save humans time (not having to do hold/unhold). Example. cc: @aledbf

Let me know if you need anything else?

kylos101 commented 1 year ago

Moved to current week so it doesn't fall off of the radar. :pray:

meysholdt commented 1 year ago

hi @kylos101! Thank you for the details. I see two used cases here - can you clarify which one you're after?

  1. When an engineer runs integration tests on a PR, and the tests fail, the failings tests do not prevent the PR from being merged. ---> If this is true, it's a bug. I created this PR https://github.com/gitpod-io/gitpod/pull/14477 to reproduce the issues, but the behavior is as expected - failing integration tests cause the build to fail. Since ci/werft/build/ is a mandatory GitHub check for the main branch, the PR can not be merged when the build fails.

  2. The integration tests should run automatically: An engineer should not need to activate them via PR description, but instead, the build should figure out which components have changed and run integration tests as needed. This implies that an engineer can not skip integration tests if the build decides it is necessary to run them.

kylos101 commented 1 year ago

Hi @meysholdt , thank you for clarifying, I'm after use case 2. Tests take 30m with a regular vm, 20m with a large vm. Thanks, @utam0k for improving test run times! Internal ref.

kylos101 commented 1 year ago

Here's another reason to require passing tests prior to merge.

https://gitpod.slack.com/archives/C03E52788SU/p1667954858802449

The Workspace tests are broken. 😢

It's possible we broke them? But hard to tell, so now we have to go splunking.

kylos101 commented 1 year ago

Here's another reason to require passing tests prior to merge.

https://gitpod.slack.com/archives/C03E52788SU/p1667954858802449

The Workspace tests are broken. cry

It's possible we broke them? But hard to tell, so now we have to go splunking.

It turns out there was a code change on the WebApp side that broke Workspace tests. In other words, we're more coupled than we'd like.

Short term, it'd be great if Workspace tests must pass, prior to allowing a merge to main. Long term, we'll need to refactor the tests, so that the Workspace tests can pass in isolation from WebApp and IDE.

Workspace tests take 33m to run on a regular vm, 20m on a large VM. Thank you, @utam0k for this change!

kylos101 commented 1 year ago

Hi @meysholdt , here is an example where a change to supervisor (IDE component) was required to fix a broken integration test for the dotfiles feature. Can you share an ETA for the RFC to help find breakage sooner, before we merge to main? I understand it could be a while as we approach the end of the year.

cc: @akosyakov @utam0k @mustard-mh

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.