elastic / elastic-stack-installers

Windows MSI packages for Elastic stack
Apache License 2.0
8 stars 15 forks source link

The staging workflow fails to produce independent agent release `.msi` binary #282

Open gogochan opened 3 months ago

gogochan commented 3 months ago

While we were testing out independent agent workflow using the branch in https://github.com/elastic/elastic-stack-installers/pull/281, we noticed that the staging workflow fails work with independent agent release version string.

Upon inspecting the Buildkite log, the workflow fails while locating elastic-agent artifact.

ElastiBuild: FindPackage#elastic-agent: Starting...
--
  | Searching local directory C:\users\buildkite\esi\bin\in ...
  | Searching Artifacts API for elastic-agent-8.14.0+build202407021002 ...
  | ElastiBuild: FindPackage#elastic-agent: System.Net.Http.HttpRequestException: Response status code does not indicate success: 404 (Not Found).
  | at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
  | at System.Net.Http.HttpClient.GetStreamAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
  | at ElastiBuild.Infra.ArtifactsApi.FindArtifact(String target, Action`1 filterConfiguration) in C:\users\buildkite\esi\src\build\Infra\ArtifactsApi.cs:line 106
  | at ElastiBuild.BullseyeTargets.FindPackageTarget.RunAsync(BuildContext ctx, String targetName) in C:\users\buildkite\esi\src\build\BullseyeTargets\FindPackageTarget.cs:line 69
  | at ElastiBuild.Commands.BuildCommand.<>c__DisplayClass20_0.<<RunAsync>b__2>d.MoveNext() in C:\users\buildkite\esi\src\build\Commands\BuildCommand.cs:line 67
  | --- End of stack trace from previous location ---
  | at Bullseye.Internal.ActionTarget.RunAsync(Boolean dryRun, Boolean parallel, Logger log, Func`2 messageOnly)
  | ElastiBuild: FindPackage#elastic-agent: Failed! Response status code does not indicate success: 404 (Not Found). (383 ms)
  | ElastiBuild: ───────────────────────────────────────────────────
  | ElastiBuild: Duration       Outcome    Target
  | ElastiBuild: ─────────────  ─────────  ─────────────────────────
  | ElastiBuild: 383 ms  7.5%   Failed!    FindPackage#elastic-agent
  | ElastiBuild: 4.73 s  92.5%  Succeeded  BuildBeatPackageCompiler
  | ElastiBuild: ───────────────────────────────────────────────────
  | ElastiBuild: Failed! (BuildInstaller#elastic-agent) (5.12 s)
  | Build staging completed with exit code 0
  |  

There could be additional failures in other areas of the workflow. The scope of this issue is to make the workflow support independent agent release.

Expected Behavior

The pipeline should complete successfully producing a Windows Installer .MSI for Independent Agent release.

Actual Behavior

It fails.

Steps to reproduce the behaviour

Create a new build at https://buildkite.com/elastic/elastic-stack-installers/builds?branch=pull%2F281%2Fmerge with following environment variables

DRA_BRANCH="8.14"
DRA_VERSION="8.14.0+build202407011002"
DRA_WORKFLOW="staging"
MANIFEST_URL="https://staging.elastic.co/independent-agent/8.14.0+build202407021002/manifest-8.14.0+build202407021002.json"
ONLY_AGENT="true"
amitkanfer commented 3 months ago

The code here resolves to the following URL: https://artifacts-api.elastic.co/v1/search/8.14.0+build202407021002/elastic-agent,windows,zip,x86_64,-oss

The artifacts base address is hard-coded here.

And indeed this URL doesn't have any results. Where should we have this artifact available for download?

gogochan commented 3 months ago

I don't think there would be a separated artifacts for Independent Agent release. The independent agent release should continue to use the existing artifacts for 8.14.0.

CC: @cmacknz, @dwhyrock, @DaveSys911

amitkanfer commented 3 months ago

So what's the URL the installer should use?

cmacknz commented 3 months ago

The code here resolves to the following URL: https://artifacts-api.elastic.co/v1/search/8.14.0+build202407021002/elastic-agent,windows,zip,x86_64,-oss

Taking a closer look at this, it looks like we are using the artifacts API (which we are trying to not depend on everywhere else because it isn't reliable) to find the artifacts to download.

However, the release manager is going to pass us MANIFEST_URL="https://staging.elastic.co/independent-agent/8.14.0+build202407021002/manifest-8.14.0+build202407021002.json"

The contents of that manifest file has the links to the artifacts needed in a standardize JSON path we could just parse.

Looking at: https://staging.elastic.co/independent-agent/8.14.0+build202407021002/manifest-8.14.0+build202407021002.json

The relevant content with the windows download link is at projects.elastic-agent-package.packages.elastic-agent-8.14.0+build202407021002-windows-x86_64.zip or projects.elastic-agent-package.packages.elastic-agent-$version-windows-x86_64.zip.

{
    "branch": "8.14",
    "release_branch": "8.14",
    "version": "8.14.0+build202407021002",
    "build_id": "8.14.0+build202407021002",
    "build_duration_seconds": 2782,
    "manifest_version": "2.1.0",
    "projects": {
        "elastic-agent-package": {
            "branch": "8.14",
            "commit_hash": "1738179d53e747c48af7350a0b8fe68eda1a5b31",
            "commit_url": "https://github.com/elastic/elastic-agent-package/commits/1738179d53e747c48af7350a0b8fe68eda1a5b31",
            "build_duration_seconds": 0,
            "packages": {
                "elastic-agent-8.14.0+build202407021002-windows-x86_64.zip": {
                    "url": "https://staging.elastic.co/independent-agent/8.14.0+build202407021002/downloads/beats/elastic-agent/elastic-agent-8.14.0+build202407021002-windows-x86_64.zip",
                    "sha_url": "https://staging.elastic.co/independent-agent/8.14.0+build202407021002/downloads/beats/elastic-agent/elastic-agent-8.14.0+build202407021002-windows-x86_64.zip.sha512",
                    "asc_url": "https://staging.elastic.co/independent-agent/8.14.0+build202407021002/downloads/beats/elastic-agent/elastic-agent-8.14.0+build202407021002-windows-x86_64.zip.asc",
                    "type": "zip",
                    "architecture": "x86_64",
                    "os": [
                        "windows"
                    ]
                }
            }
        }
    }
}

Could we just use the manifest path instead of the artifact API? This logic is the same regardless of if it is an early release or not, only the version changes. You'd get a manifest url in both cases.

amitkanfer commented 3 months ago

So if exists, use the MANIFEST_URL, if not - fallback to the existing code we have today... i'll create a PR

amitkanfer commented 3 months ago

@cmacknz , the artifact is already downloaded, see here. i believe this is thanks to what @dliappis fixed in this commit. The problem is that the code doesn't find the artifact on disk so i fixed the relevant regular expressions in this PR.

While this helps to pass the current problem, i'm now facing an exception here since +build... is not a valid version apparently, according to WixSharp (the framework we're using to compile the MSI)

cmacknz commented 2 months ago

Replacing the + with . makes sense to fix this, we had to do the same thing for Docker since it doesn't support the semantic versioning specification properly yet either. https://github.com/elastic/elastic-agent/blob/e46bc3535019ac598bdecdaa04ba96cc6bb66c18/dev-tools/mage/dockerbuilder.go#L179-L184

func (b *dockerBuilder) dockerBuild() (string, error) {
    tag := fmt.Sprintf("%s:%s", b.imageName, b.Version)
    // For Independent Agent releases, replace the "+" with a "." since the "+" character
    // currently isn't allowed in a tag in Docker
    // E.g., 8.13.0+build202402191057 -> 8.13.0.build202402191057
    tag = strings.Replace(tag, "+", ".", 1)
cmacknz commented 2 months ago

Doing some Windows specific research, we may run into the limitation of the ProductVersion property exclusively supporting major.minor.build as numbers in the range 255.255.65535.

That page links to https://learn.microsoft.com/en-us/windows/win32/msi/small-updates which seems to match with how we would view independent agent releases today:

A typical small update changes only one or two files or a registry key. Because a small update changes the information in the .msi file, the installation package code must be changed. The package code is stored in the Revision Number Summary property of the Summary Information Stream.

That requires an update to the package code which seems to be a GUID that I hope the MSI build generates for us as unique per build or something to that effect. https://learn.microsoft.com/en-us/windows/win32/msi/revision-number-summary

This might matter more if we properly supported upgrades via MSI instead of just re-installing.

amitkanfer commented 2 months ago

@gogochan can you please test against branch https://github.com/elastic/elastic-stack-installers/tree/custom_build_version and let me know if it fixes your problem?

gogochan commented 2 months ago

I cherry-picked the commit and ran the pipeline.

https://buildkite.com/elastic/elastic-stack-installers/builds/5603#019093c5-004a-4bc2-b5be-f9f41aa0015c

Packaging worked successfully, thanks @amitkanfer !!!

Now it fails with https://github.com/elastic/elastic-stack-installers/blob/custom_build_version/.buildkite/scripts/dra-publish.sh#L34 The manifest gets added to the release-manager in https://github.com/elastic/elastic-stack-installers/blob/custom_build_version/.buildkite/scripts/dra-publish.sh#L60

The requested artifact does not exist for independent agent release. https://artifacts-staging.elastic.co/beats/latest/8.14.0+build202407011002.json

Is this parameter still required when we do AGENT_ONLY build?

CC @DaveSys911 @dliappis

dliappis commented 2 months ago

I cherry-picked the commit and ran the pipeline.

https://buildkite.com/elastic/elastic-stack-installers/builds/5603#019093c5-004a-4bc2-b5be-f9f41aa0015c

Packaging worked successfully, thanks @amitkanfer !!!

Now it fails with https://github.com/elastic/elastic-stack-installers/blob/custom_build_version/.buildkite/scripts/dra-publish.sh#L34 The manifest gets added to the release-manager in https://github.com/elastic/elastic-stack-installers/blob/custom_build_version/.buildkite/scripts/dra-publish.sh#L60

The requested artifact does not exist for independent agent release. https://artifacts-staging.elastic.co/beats/latest/8.14.0+build202407011002.json

Is this parameter still required when we do AGENT_ONLY build?

CC @DaveSys911 @dliappis

@gogochan I think we simply need to enhance the filter for VERSION to also be able to remove the suffix after + (e.g. if VERSION is 8.14.0+build202407021002). Currently the filter only trims suffixes like -SNAPSHOT, as can be seen in the comment: https://github.com/elastic/elastic-stack-installers/blob/b67ab1c118e092139f1344443ddbe6da8048e02a/.buildkite/scripts/dra-publish.sh#L25-L26

I pushed a simple enhancement that also removes +... in https://github.com/elastic/elastic-stack-installers/pull/281/commits/ba0fc9552a696996f4dd078c40539eaba01c2e00

However, on a second thought, the work we are doing here is ONLY about elastic-agent. So I wonder, would it better if we simply skip invoking the release manager with --dependency: beats...: https://github.com/elastic/elastic-stack-installers/blob/b67ab1c118e092139f1344443ddbe6da8048e02a/.buildkite/scripts/dra-publish.sh#L60 so that it will only push an elastic-agent artifact?

@DaveSys911 thoughts? can you advise if invoking the release manager without the above option will work for the discussed scenario?

DaveSys911 commented 2 months ago

@dliappis I think you should avoid invoking it at all and use bk API uploads instead for this workflow. Indepedent agent release workflow is slightly different & we don't lifecycle agent artifacts using DRA like we normally do.

dliappis commented 2 months ago

@dliappis I think you should avoid invoking it at all and use bk API uploads instead for this workflow. Indepedent agent release workflow is slightly different & we don't lifecycle agent artifacts using DRA like we normally do.

@DaveSys911 Could you please elaborate? Do you mean we shouldn't be invoking the release manager tool at all, but instead upload the artifact generated in the previous step somewhere? If yes, where to?

DaveSys911 commented 2 months ago

Precisely, instead of invoking the release-manager. Use the buildkite artifacts api It has it's own hosted storage. The upstream job can fetch it using the BUILD_ID. We have a similar workflow for elastic-agent package jobs. See example here.

dliappis commented 2 months ago

Precisely, instead of invoking the release-manager. Use the buildkite artifacts api It has it's own hosted storage. The upstream job can fetch it using the BUILD_ID. We have a similar workflow for elastic-agent package jobs. See example here.

Thank you. As discussed offline there are lots of details here that need clarification, e.g re: the structure of the json file, exact path, just to name a few and ownership of the work. We'll be discussing it on a meeting later today.