Closed dliappis closed 9 months ago
This is ready for review. Because of the multiple ways that this pipeline may run -- via unified release trigger, via branch commits
or pr commits
-- we need to support a different way of running depending on the occasion:
1) If we open the PR it (e.g. the latest commit here: https://buildkite.com/elastic/elastic-stack-installers/builds/3720) it looks like:
![image](https://github.com/elastic/elastic-stack-installers/assets/1754575/ec897541-f881-42d5-9ef6-93e91246fab7)
2) If the pipeline gets triggered by the unified release it should run as before (and allow to go through the DRA publish).
3) Special handling is done for staging builds:
- Unified release may trigger the pipeline but only for non main branches and specifically if they target `[0-9]+.[0-9]+` matching branches. Tests are skipped.
- PRs are only allowed to run the steps (including tests) if they target non-main branch (the reason is that there is no https://staging.elastic.co/latest/main.json, so we can't deduce a valid `MANIFEST_URL`)
4) We've removed runs based on push events -- e.g. cutting a new branch from main
-- (which used to be allowed before); this is because it complicates unnecessarily the conditionals. This use case (and more) will be covered in a follow up PR, through a scheduled job that runs at regular intervals.
Example run: https://buildkite.com/elastic/elastic-stack-installers/builds/3720 using this PR
@strawgate the tests seem to be failing recently with unknown flag: --delayenroll
, see https://buildkite.com/elastic/elastic-stack-installers/builds/3720#018d785e-7668-4794-8a0f-2f39dbe52b5e/3827-4851
any thoughts on this?
@strawgate the tests seem to be failing recently with
unknown flag: --delayenroll
, see https://buildkite.com/elastic/elastic-stack-installers/builds/3720#018d785e-7668-4794-8a0f-2f39dbe52b5e/3827-4851any thoughts on this?
That's just the debug output for a negative test case. The failing test case is:
[-] Elastic Agent MSI Installer.Basic Tests for Standalone mode.Blocks downgrades and upgrades in Standalone mode 44.52s (44.52s\|4ms)
--
| [0] Expected $true, but got $false.
| at Has-AgentStandaloneLog \| Should -BeTrue, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:28
| at <ScriptBlock>, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:28
| at Assert-AgentHealthy, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:119
| at <ScriptBlock>, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:177
| [1] Expected $false, because The agent should have been cleaned up already, but got $true.
| at Is-AgentBinaryPresent \| Should -BeFalse -Because "The agent should have been cleaned up already", C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:72
| at Check-AgentRemnants, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:72
| at <ScriptBlock>, C:\buildkite-agent\builds\bk-agent-prod-gcp-1707121483772742297\elastic\elastic-stack-installers\src\agent-qa\msi.tests.ps1:109
Looking into this
Pushed a commit, i think it was just a race condition between the agent starting up and us checking its log file
Build failed with:
Starting a Gradle Daemon, 1 incompatible Daemon could not be reused, use --status for details
--
|
| FAILURE: Build failed with an exception.
|
| * What went wrong:
| The specified project directory '/release/project-configs/fix-pr-tests' does not exist.
|
| * Try:
| > Run with --stacktrace option to get the stack trace.
| > Run with --info or --debug option to get more log output.
| > Run with --scan to get full insights.
|
| * Get more help at https://help.gradle.org
|
| BUILD FAILED in 2s
| 🚨 Error: The command exited with status 1
re-running build now
Build failed with:
Starting a Gradle Daemon, 1 incompatible Daemon could not be reused, use --status for details -- | | FAILURE: Build failed with an exception. | | * What went wrong: | The specified project directory '/release/project-configs/fix-pr-tests' does not exist. | | * Try: | > Run with --stacktrace option to get the stack trace. | > Run with --info or --debug option to get more log output. | > Run with --scan to get full insights. | | * Get more help at https://help.gradle.org | | BUILD FAILED in 2s | 🚨 Error: The command exited with status 1
re-running build now
This is an unrelated failure in the DRA publish step: we allow running the publish step in dry run mode from within PRs, but it seems to me that we shouldn't: the unified release expected certain hard coded directories (e.g. 8.13
, main
etc.) so maybe the best approach would be to skip the DRA step altogether when running from a PR. @DaveSys911 what do you think?
@strawgate FYI we are still having failures related to tests after your latest commits: https://buildkite.com/elastic/elastic-stack-installers/builds/3747
@strawgate FYI we are still having failures related to tests after your latest commits: https://buildkite.com/elastic/elastic-stack-installers/builds/3747
Yep will look today, my last commit was to add some additional debug info
It looks like the last test run passed, will rerun a couple of times to make sure it's not flaky
@strawgate your latest commit seems to have fixed the tests: https://buildkite.com/elastic/elastic-stack-installers/builds/3750#018d7eb8-cb49-4989-ad1b-f5314ec4f0d6
The DRA publish step is failing because we are running from a PR (and the branch in the PR is not main
or 7.x
or 8.x
). I pushed a change to skip this test, if running from within the context of a PR. Hopefully this will lead to 100% successful PR checks.
It looks like the last test run passed, will rerun a couple of times to make sure it's not flaky
Sounds like a great idea!
It looks like the last test run passed, will rerun a couple of times to make sure it's not flaky
We got another fresh failure in https://buildkite.com/elastic/elastic-stack-installers/builds/3751#018d7ee0-3fea-4c2a-abfd-0d6f6f6969e3
fingers crossed on this one -- if this last change doesn't fix it i'll disable the test case that triggers the race condition and fix it in a separate PR
Tests passed with latest change, running another retest to check for flaky tests
@strawgate I think we are good here. I just triggered one more build to check for flakiness: https://buildkite.com/elastic/elastic-stack-installers/builds/3794
could you provide a final review so that we can merge?
Also @amitkanfer could please take a look again?
buildkite test this
could you provide a final review so that we can merge?
done!
PR #212 made some changes on
main
that require the presence ofMANIFEST_URL
. However, the pipeline is still configured to run the same script when a PR (from an upstream branch) is raised and it currently fails.This commit skips tests when triggered from another pipeline (unified release), and allows running tests when triggered by PR creation by picking up a suitable
MANIFEST_URL
.