camunda / camunda-platform-helm

Camunda Platform 8 Self-Managed Helm charts
https://docs.camunda.io/docs/self-managed/overview/
Apache License 2.0
72 stars 136 forks source link

[ISSUE] Recreate the OpenShift CI cluster to get routes working #2488

Open hamza-m-masood opened 4 days ago

hamza-m-masood commented 4 days ago

Describe the issue:

This issue is a continuation of https://github.com/camunda/distribution/issues/314 I had updated the OpenShift CI cluster in the hopes fo the routes operator being reinstalled. This was not the case as the routes objects are still not being generated from the ingress objects.

I will now try to replace the OpenShift CI cluster with a newly created one.

Actual behavior:

Currently, the routes operator is not working in the existing OpenShift cluster.

Expected behavior:

The goal is to have the routes operator working in the new OpenShift cluster that will replace the existing cluster.

How to reproduce:

Create any helm deployment with ingress enabled. You will see that the route objects are not generated by default.

Logs:

Environment:

Please note: Without the following info, it's hard to resolve the issue and probably it will be closed.

hamza-m-masood commented 2 days ago

At first, I wanted to create another ROSA cluster in another AWS region. For example, in eu-west-1. I wanted to reduce downtime as much as possible by doing a blue/green deployment. I realized that it was not needed because no team is using the OpenShift cluster except the distribution team and a 1 or 2 hour downtime is acceptable.

hamza-m-masood commented 2 days ago

I need to remember to change these values in the gitihub actions to get the integration tests to work again: https://github.com/camunda/camunda-platform-helm/blob/97b781c5cbc9f3f1cdd2a003a9ff48530c471e0f/.github/workflows/test-integration-template.yaml#L202C1-L204C76

hamza-m-masood commented 2 days ago

I will follow the docs here to update the cluster: https://github.com/camunda/distribution/blob/main/docs/bootstrap.md

hamza-m-masood commented 2 days ago

I pulled the distribution repo and I ran the following command on the current OpenShift cluster:

kustomize build --enable-helm  clusters/rosa-distro-ci-hcp | kubectl apply -f -

I got the following error:

Resource: "apiextensions.k8s.io/v1, Resource=customresourcedefinitions", GroupVersionKind: "apiextensions.k8s.io/v1, Kind=CustomResourceDefinition"
Name: "prometheuses.monitoring.coreos.com", Namespace: ""
for: "STDIN": error when patching "STDIN": CustomResourceDefinition.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

I tried to update the prometheus version locally and apply the changes. I still get the same error

I also tried using kubectl create instead of kubectl apply. This does work but it will not work it is not the best solution because the kubernetes object does not get the last-applied-configuration annotation. Since I am creating a new cluster and don't need the functionality of kubectl apply at the first install, I will move on.

hamza-m-masood commented 1 day ago

I deleted the existing ROSA cluster and recreated it with this command:

export CLUSTER_NAME="distro-ci-hcp"
export REGION="eu-central-1"
export OIDC_ID="$(rosa create oidc-config --mode auto --managed --yes -o json | jq -r '.id')"
export PUBLIC_SUBNET_ID="subnet-08ce0d8c578a68cc9" # From the VPC script.
export PRIVATE_SUBNET_ID="subnet-02f2cc64170b28027" # From the VPC script.

rosa create cluster --cluster-name "${CLUSTER_NAME}" \
    --region "${REGION}" \
    --oidc-config-id "${OIDC_ID}" \
    --subnet-ids "${PUBLIC_SUBNET_ID},${PRIVATE_SUBNET_ID}" \
    --hosted-cp \
    --enable-autoscaling \
    --min-replicas 2 \
    --max-replicas 32 \
    --sts --mode auto --yes
hamza-m-masood commented 1 day ago

I noticed a typo in the technical docs. I created a PR to fix it: https://github.com/camunda/distribution/pull/330

hamza-m-masood commented 1 day ago

Here is the output of the above command:

I: Using '683113427784' as billing account
I: To use a different billing account, add --billing-account xxxxxxxxxx to previous command
W: More than one Installer role found
I: Using arn:aws:iam::683113427784:role/ManagedOpenShift-HCP-ROSA-Installer-Role for the Installer role
I: Using arn:aws:iam::683113427784:role/ManagedOpenShift-HCP-ROSA-Worker-Role for the Worker role
I: Using arn:aws:iam::683113427784:role/ManagedOpenShift-HCP-ROSA-Support-Role for the Support role
I: Creating cluster 'distro-ci-hcp'
I: To view a list of clusters and their status, run 'rosa list clusters'
I: Cluster 'distro-ci-hcp' has been created.
I: Once the cluster is installed you will need to add an Identity Provider before you can login into the cluster. See 'rosa create idp --help' for more information.

Name:                       distro-ci-hcp
Domain Prefix:              distro-ci-hcp
Display Name:               distro-ci-hcp
ID:                         2ehsadntijksurh6fee11vbd1to2sk5b
External ID:                a29a08fa-e80d-48ca-903b-fe4504b8f69b
Control Plane:              ROSA Service Hosted
OpenShift Version:          4.14.38
Channel Group:              stable
DNS:                        Not ready
AWS Account:                683113427784
AWS Billing Account:        683113427784
API URL:
Console URL:
Region:                     eu-central-1
Availability:
 - Control Plane:           MultiAZ
 - Data Plane:              SingleAZ

Nodes:
 - Compute (Autoscaled):    2-32
 - Compute (current):       0
Network:
 - Type:                    OVNKubernetes
 - Service CIDR:            172.30.0.0/16
 - Machine CIDR:            10.0.0.0/16
 - Pod CIDR:                10.128.0.0/14
 - Host Prefix:             /23
 - Subnets:                 subnet-08ce0d8c578a68cc9, subnet-02f2cc64170b28027
EC2 Metadata Http Tokens:   optional
Role (STS) ARN:             arn:aws:iam::683113427784:role/ManagedOpenShift-HCP-ROSA-Installer-Role
Support Role ARN:           arn:aws:iam::683113427784:role/ManagedOpenShift-HCP-ROSA-Support-Role
Instance IAM Roles:
 - Worker:                  arn:aws:iam::683113427784:role/ManagedOpenShift-HCP-ROSA-Worker-Role
Operator IAM Roles:
 - arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-kube-system-capa-controller-manager
 - arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-kube-system-control-plane-operator
 - arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-kube-system-kms-provider
 - arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-openshift-cloud-network-config-controller-clo
 - arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-openshift-image-registry-installer-cloud-cred
 - arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-openshift-ingress-operator-cloud-credentials
 - arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-openshift-cluster-csi-drivers-ebs-cloud-crede
 - arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-kube-system-kube-controller-manager
Managed Policies:           Yes
State:                      waiting (Waiting for user action)
Private:                    No
Delete Protection:          Disabled
Created:                    Oct 21 2024 08:39:21 UTC
User Workload Monitoring:   Enabled
Details Page:               https://console.redhat.com/openshift/details/s/2njwnlLwoJrXV17i7InzN42eMgJ
OIDC Endpoint URL:          https://oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k (Managed)
Audit Log Forwarding:       Disabled
External Authentication:    Disabled
Etcd Encryption:            Disabled

I: Preparing to create operator roles.
I: Creating roles using 'arn:aws:iam::683113427784:user/hamza.masood'
I: Attached trust policy to role 'distro-ci-hcp-b5g2-openshift-image-registry-installer-cloud-cred(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-openshift-image-registry-installer-cloud-cred)': {"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRoleWithWebIdentity"], "Effect": "Allow", "Condition": {"StringEquals": {"oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k:sub": ["system:serviceaccount:openshift-image-registry:cluster-image-registry-operator" , "system:serviceaccount:openshift-image-registry:registry"]}}, "Principal": {"Federated": "arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k"}}]}
I: Created role 'distro-ci-hcp-b5g2-openshift-image-registry-installer-cloud-cred' with ARN 'arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-openshift-image-registry-installer-cloud-cred'
I: Attached policy 'ROSAImageRegistryOperatorPolicy(https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ROSAImageRegistryOperatorPolicy)' to role 'distro-ci-hcp-b5g2-openshift-image-registry-installer-cloud-cred(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-openshift-image-registry-installer-cloud-cred)'

I: Attached trust policy to role 'distro-ci-hcp-b5g2-openshift-ingress-operator-cloud-credentials(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-openshift-ingress-operator-cloud-credentials)': {"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRoleWithWebIdentity"], "Effect": "Allow", "Condition": {"StringEquals": {"oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k:sub": ["system:serviceaccount:openshift-ingress-operator:ingress-operator"]}}, "Principal": {"Federated": "arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k"}}]}
I: Created role 'distro-ci-hcp-b5g2-openshift-ingress-operator-cloud-credentials' with ARN 'arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-openshift-ingress-operator-cloud-credentials'
I: Attached policy 'ROSAIngressOperatorPolicy(https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ROSAIngressOperatorPolicy)' to role 'distro-ci-hcp-b5g2-openshift-ingress-operator-cloud-credentials(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-openshift-ingress-operator-cloud-credentials)'

I: Attached trust policy to role 'distro-ci-hcp-b5g2-openshift-cluster-csi-drivers-ebs-cloud-crede(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-openshift-cluster-csi-drivers-ebs-cloud-crede)': {"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRoleWithWebIdentity"], "Effect": "Allow", "Condition": {"StringEquals": {"oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k:sub": ["system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator" , "system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-controller-sa"]}}, "Principal": {"Federated": "arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k"}}]}
I: Created role 'distro-ci-hcp-b5g2-openshift-cluster-csi-drivers-ebs-cloud-crede' with ARN 'arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-openshift-cluster-csi-drivers-ebs-cloud-crede'
I: Attached policy 'ROSAAmazonEBSCSIDriverOperatorPolicy(https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ROSAAmazonEBSCSIDriverOperatorPolicy)' to role 'distro-ci-hcp-b5g2-openshift-cluster-csi-drivers-ebs-cloud-crede(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-openshift-cluster-csi-drivers-ebs-cloud-crede)'

I: Attached trust policy to role 'distro-ci-hcp-b5g2-kube-system-kube-controller-manager(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-kube-system-kube-controller-manager)': {"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRoleWithWebIdentity"], "Effect": "Allow", "Condition": {"StringEquals": {"oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k:sub": ["system:serviceaccount:kube-system:kube-controller-manager"]}}, "Principal": {"Federated": "arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k"}}]}
I: Created role 'distro-ci-hcp-b5g2-kube-system-kube-controller-manager' with ARN 'arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-kube-system-kube-controller-manager'
I: Attached policy 'ROSAKubeControllerPolicy(https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ROSAKubeControllerPolicy)' to role 'distro-ci-hcp-b5g2-kube-system-kube-controller-manager(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-kube-system-kube-controller-manager)'

I: Attached trust policy to role 'distro-ci-hcp-b5g2-kube-system-capa-controller-manager(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-kube-system-capa-controller-manager)': {"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRoleWithWebIdentity"], "Effect": "Allow", "Condition": {"StringEquals": {"oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k:sub": ["system:serviceaccount:kube-system:capa-controller-manager"]}}, "Principal": {"Federated": "arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k"}}]}
I: Created role 'distro-ci-hcp-b5g2-kube-system-capa-controller-manager' with ARN 'arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-kube-system-capa-controller-manager'
I: Attached policy 'ROSANodePoolManagementPolicy(https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ROSANodePoolManagementPolicy)' to role 'distro-ci-hcp-b5g2-kube-system-capa-controller-manager(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-kube-system-capa-controller-manager)'

I: Attached trust policy to role 'distro-ci-hcp-b5g2-kube-system-control-plane-operator(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-kube-system-control-plane-operator)': {"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRoleWithWebIdentity"], "Effect": "Allow", "Condition": {"StringEquals": {"oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k:sub": ["system:serviceaccount:kube-system:control-plane-operator"]}}, "Principal": {"Federated": "arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k"}}]}
I: Created role 'distro-ci-hcp-b5g2-kube-system-control-plane-operator' with ARN 'arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-kube-system-control-plane-operator'
I: Attached policy 'ROSAControlPlaneOperatorPolicy(https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ROSAControlPlaneOperatorPolicy)' to role 'distro-ci-hcp-b5g2-kube-system-control-plane-operator(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-kube-system-control-plane-operator)'

I: Attached trust policy to role 'distro-ci-hcp-b5g2-kube-system-kms-provider(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-kube-system-kms-provider)': {"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRoleWithWebIdentity"], "Effect": "Allow", "Condition": {"StringEquals": {"oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k:sub": ["system:serviceaccount:kube-system:kms-provider"]}}, "Principal": {"Federated": "arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k"}}]}
I: Created role 'distro-ci-hcp-b5g2-kube-system-kms-provider' with ARN 'arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-kube-system-kms-provider'
I: Attached policy 'ROSAKMSProviderPolicy(https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ROSAKMSProviderPolicy)' to role 'distro-ci-hcp-b5g2-kube-system-kms-provider(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-kube-system-kms-provider)'

I: Attached trust policy to role 'distro-ci-hcp-b5g2-openshift-cloud-network-config-controller-clo(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-openshift-cloud-network-config-controller-clo)': {"Version": "2012-10-17", "Statement": [{"Action": ["sts:AssumeRoleWithWebIdentity"], "Effect": "Allow", "Condition": {"StringEquals": {"oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k:sub": ["system:serviceaccount:openshift-cloud-network-config-controller:cloud-network-config-controller"]}}, "Principal": {"Federated": "arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k"}}]}
I: Created role 'distro-ci-hcp-b5g2-openshift-cloud-network-config-controller-clo' with ARN 'arn:aws:iam::683113427784:role/distro-ci-hcp-b5g2-openshift-cloud-network-config-controller-clo'
I: Attached policy 'ROSACloudNetworkConfigOperatorPolicy(https://docs.aws.amazon.com/aws-managed-policy/latest/reference/ROSACloudNetworkConfigOperatorPolicy)' to role 'distro-ci-hcp-b5g2-openshift-cloud-network-config-controller-clo(https://console.aws.amazon.com/iam/home?#/roles/distro-ci-hcp-b5g2-openshift-cloud-network-config-controller-clo)'

I: Preparing to create OIDC Provider.
I: Creating OIDC provider using 'arn:aws:iam::683113427784:user/hamza.masood'
I: Created OIDC provider with ARN 'arn:aws:iam::683113427784:oidc-provider/oidc.op1.openshiftapps.com/2ehsa70cakefnit76ap8kri90hf2e36k'
I: To determine when your cluster is Ready, run 'rosa describe cluster -c distro-ci-hcp'.
I: To watch your cluster installation logs, run 'rosa logs install -c distro-ci-hcp --watch'.
hamza-m-masood commented 1 day ago

I applied the manifests successfully via kustomize

hamza-m-masood commented 1 day ago

I changed the DISTRO_CI_OPENSHIFT_CLUSTER_URL to the new server URL and now the integration tests are running on the new cluster.

hamza-m-masood commented 1 day ago

I still dont't see the new routes being generated even though I recreated the cluster ☹️ The namespace openshift-route-controller-manager is still empty.

I guess it makes sense for the namespace to be empty because redhat might manage it in a different way when HCP deployment option is chosen. For example even though the namespace is empty, I can run this command:

oc get events -n openshift-route-controller-manager
LAST SEEN   TYPE     REASON           OBJECT                              MESSAGE
63m         Normal   LeaderElection   lease/openshift-route-controllers   openshift-route-controller-manager-7895f548d8-jnqdc_61806fd5-1076-40bf-8c31-16a24f745a0e became leader

I get some output of some leader election. But I am not able to grab the logs of describe the pod.

Maybe the AWS subnet configuration is incorrect? Maybe it is something to do with the ingress objects not getting proper access. I will check...

hamza-m-masood commented 1 day ago

After a lot of trial and error I found out that routes do get created but only on well-known ports such as 8080 or 443.

The openshift-route-controller-manager namespace is typically empty because the controller responsible for managing routes is running in Red Hat’s AWS account, not in the customer’s cluster.

hamza-m-masood commented 1 hour ago

After some more testing, I realized that the route does, in fact, get generated with non-well-known ports. Instead, the route will not be created if it finds some error in either of the following:

  1. Kubernetes service configuration (for example, defining a wrong target port)
  2. ingress configuration (for example, defining the wrong port to route to a service)
  3. tls configuration in ingress (for example, referencing a non-existing tls secret in the ingress object)

Currently, the external-secrets operator is having a problem. So, I am looking at how the TLS secret can be replicated into a new namespace created from Pull Requests.