Closed T-Kukawka closed 7 months ago
Migration completed successfully,
BigMac application are deployed on the target CAPI MC golem
and function as expected.
...
Deleted vintage au6g2 node pool ASG.
Executing the following command to apply non-default apps to CAPI MC via external tool:
app-migration-cli apply -s garfish -d golem -n spyros02 -o org-capa-migration-testing
Connected to gs-garfish, k8s server version v1.24.17
Connected to gs-golem, k8s server version v1.24.16
All prerequistes are found on the new MC for app migration
Applying all non-default APP CRs to MC
All non-default apps applied successfully.
Apps (0) applied successfully to golem-spyros02
Finalizer removed on NS: garfish/spyros02
Finished migrating cluster spyros02 to CAPI infrastructure
On the new cluster we can see everything smoothly migrated
k get apps -norg-capa-migration-testing|grep spyros02
spyros02 0.60.0 18m 13m deployed
spyros02-app-operator 6.10.0 18m 18m deployed
spyros02-athena 1.12.1 63s 59s deployed
spyros02-aws-pod-identity-webhook 1.14.1 18m 14m deployed
spyros02-capi-node-labeler 0.5.0 18m 11m deployed
spyros02-cert-exporter 2.8.5 18m 15m deployed
spyros02-cert-manager 3.7.0 18m 11m deployed
spyros02-chart-operator 3.1.0 18m 11m deployed
spyros02-chart-operator-extensions 18m already-exists
spyros02-cluster-autoscaler 1.27.3-gs3 18m 11m deployed
spyros02-default-apps 0.45.1 18m 18m deployed
spyros02-default-ingress-nginx 3.5.1 63s 58s deployed
spyros02-default-rbac-bootstrap 0.2.1 63s 43s deployed
spyros02-dex-app 1.42.8 62s 1s deployed
spyros02-etcd-k8s-res-count-exporter 1.8.0 18m 14m deployed
spyros02-external-dns 2.42.0 18m 11m deployed
spyros02-grafana-agent 0.3.2 18m 11m deployed
spyros02-kube-prometheus-stack 8.1.1 18m 13m deployed
spyros02-kyverno 0.16.4 18m 15m deployed
spyros02-kyverno-policies 0.20.2 18m 11m deployed
spyros02-kyverno-policy-operator 0.0.6 18m 11m deployed
spyros02-metrics-server 2.4.2 18m 13m deployed
spyros02-net-exporter 1.18.2 18m 15m deployed
spyros02-node-exporter 1.18.2 18m 11m deployed
spyros02-observability-bundle 1.0.0 18m 18m deployed
spyros02-prometheus-agent 0.6.6 18m 11m deployed
spyros02-prometheus-operator-crd 8.0.0 18m 11m deployed
spyros02-promtail 1.4.1 18m 11m deployed
spyros02-security-bundle 1.5.0 18m 18m deployed
spyros02-teleport-kube-agent 0.7.0 18m 15m deployed
spyros02-vertical-pod-autoscaler 4.6.0 18m 11m deployed
There is only one issue on the certificates generation by cert-manager-app
.
That is related to the external-DNS
which as of the following logs can't retrieve credentials.
time="2024-02-06T21:28:42Z" level=info msg="Instantiating new Kubernetes client"
time="2024-02-06T21:28:42Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2024-02-06T21:28:42Z" level=info msg="Created Kubernetes client https://172.31.0.1:443"
time="2024-02-06T21:29:19Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements\n\tstatus code: 400, request id: b615ff15-46d4-469d-960a-a028d7410be7"
I have made adjustments in the tracking ticket as well as the teams tickets regarding the CAPA and migration testing instructions.
TL;DR: Testing of CAPA/Migration is moved from gazelle
to grizzly
Initially gazelle
has been chosen to test the CAPA migration as it is a Production MC, meaning most stable one. However this has resulted in unforeseen pages towards kaas-cloud oncall that we would like to limit.
We do recognise the pages and also actively work on testing, hence such pages are just a distraction away from the operations clusters that most of the teams have migrated the GS production workloads on.
Taking all the facts into consideration we have decided that it would be best to move the testing to grizzly
which is stable-testing installation. Installation is primarily running e2e test and is treated as stable (no changes on the MCs).
Thanks for understanding and let us know if something is not working
@gawertm can we close this?
there was an issue with external-dns which most likely was not related to our apps. we wanted to monitori if the external-dns issue got fixed and then closing. if that's the case, we can close yes
@gawertm i believe this was the issue? https://github.com/giantswarm/giantswarm/issues/29985
yes I think so, maybe @ssyno can confirm, he worked on that
The time has come to start testing final releases as well as migration from Vintage
v20
to CAPA. We have created a dedicated Vintage MCgarfish
to perform any vintage or migration testing for stability purposes. The dedicated CAPA cluster for migration will be the CAPA stable-testing MCgrizzly
.We would kindly ask all teams to perform comprehensive tests for 3 use-cases, ordered in terms of priorities if they can't be performed all at once.
1. Vintage
AWS v20
Cluster creation on
garfish
-giantswarm
OrganizationThis is the last release of Vintage containing
1.25 k8s
. The1.25
kubernetes introduces a breaking change in terms of removalPSPs
from its API, meaning that all workloads will have to comply with theglobal
toggle disablingPSPs
as in19.3.x
release. Prior to makingv20
release available to customers, we need to validate that all applications are running smoothly. The Vintage tests are standard as always - you just create thev20
release and validate your applications. Separate stable MC in this case will guarantee no manual changes in the release and stability.v20
testing (please mark it in main issue as well)2. CAPA
0.60.0
Cluster creation on
grizzly
-giantswarm
Organization - be aware that this is production MC, so it will page everyone. In practice any CAPA MC should work for this test.Starting with
cluster-aws
-v0.60.0
anddefault-apps-aws
-v0.45.1
onwards CAPA supports Kubernetes 1.25 with all needed features to run our workloads in the same manner as on VIntage clusters. Please for testing use always latestcluster-aws
as well asdefault-apps-aws
releases.3. Vintage to CAPA migration
Cluster creation for migration on
garfish
-capa-migration-testing
Organization. Clusters will be migrated togrizzly
-capa-migration-testing
Organization.Phoenix and Honeybadger worked extensively on making the migration as smooth as possible. The
migration-cli
has been introduced that orchestrates migration of apps as well as infrastructure. Here the main point is to discover if your application and any custom configurations that could be applied by customers are migrated properly.The
migration-cli
has been extended to facilitate easy testing for all teams ad Giant Swarm. Please follow the requirements as well as the procedure that is described in thetests
section of the tool. In case of any issue with infrastructure - ping Phoenix, if the app/configmap migration will face any issues or inconsistencies - ping Honeybadger.