Closed niedbalski closed 3 months ago
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
Got a limited POC going now to do all this in a single repo with an action: pushes to S3 for packages and GHCR for images. These are all staging and then next stage is to test and "bless", i.e. release.
Further discussion with @niedbalski has clarified a few things:
@niedbalski @patrick-stephens
S3 should not be used for releases. Many users and customers have restricted access to S3 buckets and have whitelisted fluentbit domains to allow mirror the repos locally. We should continue using the native repos.
@edsiper @patrick-stephens
s3 can handle custom domains, the domain mapping shouldn't change, any existing whitelist related to packages.fluentbit.io and apt.fluentbit.io should remain the same, in fact, we are aiming for the release bucket to keep the same exact layout/structure without changes.
Enabling s3 has many benefits for us, including CDN, replication, backup, simplify the releases, etc.
Current plan therefore is to use a parallel workflow where we maintain the current process but also start producing the S3 bucket for release as well to evaluate. We also need to ensure build times are kept low, possibly by using a self-hosted runner for it.
@patrick-stephens
Here is my take for testing on top of staging:
Docker smoke test on 2 archs (amd64/arm64) with multiarch images.
Kubernetes smoke test (kind or k3s)
Agreed, I think for golden config I'll add a dummy
input & stdout
output to exercise the pipeline a bit. This is what I've done previously and then you can easily check for the expected output too. Eventually we can evolve this to do more if we want.
In fact, the default config might be fine - it's a shame that the server is not defaulted to running (I know people get tripped up on the helm chart healthchecks by this). It does CPU and stdout
already.
Staging build is almost there now, just resolving some GPG signing issues but should present an S3 bucket with all the repos set up correctly. Container images built, scanned (Trivy + Dockle) and signed (Cosign) before staging to ghcr.io.
Container testing as per the above is in place - verify each architecture image locally then use the Helm chart to verify in K8S deployment (whatever is the default in KIND when run). Package verification is in progress using kitchen-dokken: OS-based images for each target have the package installed and then we verify the service is running.
We will also look to trigger downstream integration and soak tests in staging to verify more things. @niedbalski I'll add workflow_call and workflow_dispatch to https://github.com/calyptia/fluent-bit-ci/blob/main/.github/workflows/main-gcp.yaml We then need to set up the soak test for some level of verification automatically but also manual approval for release.
We should get in the suggestions here: https://github.com/fluent/fluent-bit/issues/4389
In regards to integration testing:
[0] https://github.com/calyptia/fluent-bit-ci/blob/main/.github/workflows/main-gcp.yaml#L7
@patrick-stephens As a reference for the build/release to staging workflows.
For 4, that is covered by the private mirror due to the security concerns.
Need to add resilience and performance testing: https://github.com/fluent/fluent-bit/discussions/4390
Need to support package downgrade as well, i.e. official --> staging --> official and stays working. More distributions tested too.
Working on adding the release promotion job now:
Packages (RPM + Deb) looks ok now, working on container release now.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale
label.
is this ok to close?
Problem Description
The current workflow of building package is mostly manual. We have some automation testing on place, namely this workflow [0] Publication isn't automated and we don't have a staging repository to test installs and upgrades to the release bucket.
Proposed solution
[0] https://github.com/fluent/fluent-bit/blob/master/.github/workflows/build-release.yaml
Known Limitations