cloud-native-toolkit / planning

The is the planning repo to manage the cross project Epics and Issues. Tasks and Bugs
3 stars 1 forks source link

Toolkit fails installation when installed on RedHat OpenShift (OCP) IBM Power. #850

Open smcotugno opened 3 years ago

smcotugno commented 3 years ago

Describe the bug A clear and concise description of what the bug is.

I attempted to install the Cloud Native Toolkit (https://cloudnativetoolkit.dev/setup/fast-start/#installing-the-toolkit) and it failed on the installation. Below is the description of the toolkit job pod. I executed the the fast start installation from my MacOS.

master $ oc describe po ibm-toolkit-g44kq -n default
Name: ibm-toolkit-g44kq Namespace: default Priority: 0 Node: worker-3/10.3.158.100 Start Time: Thu, 15 Jul 2021 21:38:33 -0700 Labels: controller-uid=b9c59f3d-a159-4024-a4e8-ed971e471b5f job-name=ibm-toolkit run=ibm-toolkit Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.131.0.17" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.131.0.17" ], "default": true, "dns": {} }] Status: Failed IP: 10.131.0.17 IPs: IP: 10.131.0.17 Controlled By: Job/ibm-toolkit Containers: toolkit: Container ID: cri-o://a772f7ab464b8312754783e8c0e6138c23fdadd2ff79390a2c352e9d97cc6c0d Image: quay.io/ibmgaragecloud/cli-tools:v0.10.0-lite Image ID: quay.io/ibmgaragecloud/cli-tools@sha256:1d7d3409382b87665ed49ee4b26f01d8fd971d9a33c6ef28b922dd6d82436149 Port: Host Port: Command: /bin/bash -c set -ex git clone -b master --depth=1 https://github.com/cloud-native-toolkit/ibm-garage-iteration-zero.git /source cd /source

  # Customize Installation

  # Add components
  # cp terraform/stages-ocp4/catalog/stage2-jaeger.tf terraform/stages-ocp4/

  # Remove any of these if you already have them available outside the cluster
  # rm terraform/stages-ocp4/stage2-argocd.tf
  # rm terraform/stages-ocp4/stage2-artifactory.tf
  # rm terraform/stages-ocp4/stage2-sonarqube.tf
  # rm terraform/stages-ocp4/stage2-pactbroker.tf

  # Remove Optional addons not utilize during default pipeline runs
  # rm terraform/stages-ocp4/stage2-swagger-editor.tf

  export TF_VAR_server_url=$(oc whoami --show-server)
  set +x
  export TF_VAR_login_token=$(oc whoami -t)
  STARTTIME=$(date +%s)
  ./terraform/runTerraform.sh --ocp -a
  DURATION=$(($(date +%s) - $STARTTIME))
  echo -e "\033[0;92m Toolkit install took: $(($DURATION / 60))m$(($DURATION % 60))s \033[0m"

State:          Terminated
  Reason:       Error
  Exit Code:    1
  Started:      Thu, 15 Jul 2021 21:38:37 -0700
  Finished:     Thu, 15 Jul 2021 21:38:37 -0700
Ready:          False
Restart Count:  0
Environment Variables from:
  ibm-toolkit  ConfigMap with prefix 'TF_VAR_'  Optional: false
Environment:   <none>
Mounts:
  /source from source (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from ibm-toolkit-token-xv4zs (ro)

Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: source: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: ibm-toolkit-token-xv4zs: Type: Secret (a volume populated by a Secret) SecretName: ibm-toolkit-token-xv4zs Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Normal AddedInterface 20m multus Add eth0 [10.131.0.17/23] Normal Pulled 20m kubelet, worker-3 Container image "quay.io/ibmgaragecloud/cli-tools:v0.10.0-lite" already present on machine Normal Created 20m kubelet, worker-3 Created container toolkit Normal Started 20m kubelet, worker-3 Started container toolkit Normal Scheduled 19m default-scheduler Successfully assigned default/ibm-toolkit-g44kq to worker-3 To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

IBM Cloud Select the services and tools affected

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

smcotugno commented 3 years ago

@seansund Update

  1. Pull Image Issue [RESOLVED]:. This was an issue as docker.io image repo now has a limit on pulls using an anonymous user. I upgraded my personal account (spent $60) so as to avoid the limit issue by using my account credentials. I created the docker-token secret using my account. I needed to add the pull secret to the service accounts for all the tools in the tools namespace. That solved the this issue. Thank to Hollis for helping as I am learning more about the toolkit architecture.

  2. Possible Power Issue [NEED HELP from Cloud Native Toolkit Team]: this issue appears to be a Power issue. I get the same error in all the tools pods, and that is standard_init_linux.go:219: exec user process caused: exec format error

shankarathi07 commented 3 years ago

We checked the following tools: dashboard-developer-dashboard, pact-broker, sonarqube-sonarqube, swaggereditor and none of them have the support for the ppc64le architecture. See attached files.

developer_dashboard.log pact_broker.log sonarqube.log swagger.log

mjperrins commented 3 years ago

do you have recommendation on how to support power ? I am seeing a number of IBMers are starting to look at the toolkit on power , any help PRs and contribution would be helpful

smcotugno commented 3 years ago

do you have recommendation on how to support power ? I am seeing a number of IBMers are starting to look at the toolkit on power , any help PRs and contribution would be helpful @mjperrins Yeh - I think we are doing the FOAK in this situation.

csantanapr commented 3 years ago

My recommendation would be to use 2 clusters 1 cluster with tools amd64 and the other for ppc64

Then develop pipelines/tasks that goes across the 2 clusters

smcotugno commented 3 years ago

My recommendation would be to use 2 clusters 1 cluster with tools amd64 and the other for ppc64

Then develop pipelines/tasks that goes across the 2 clusters

@csantanapr Question - During the build task of the pipeline, since it will build the container image on the amd64 clauster, can that container image that is built be built to support a ppc64 architecture, since our app we are building needs to support the ppc64 architecture?

smcotugno commented 3 years ago

@seansund @mjperrins @csantanapr

In order for the client dev team to move forward to leverage the work of the CNT, the plan is as follows:

  1. Request a second OCP Cluster (x86 based) for a management cluster to install the CNT and leverage the server tools (i.e. Sonarqube, Artifactory, dashboard, etc).
  2. To leverage the CNT sync/pipeline commands for the client work in the OCP Cluster/ppc64, we will folk/branch the CNT https://github.com/IBM/ibm-garage-tekton-tasks repo and modify the tool/pipeline code so that it uses the pipeline operator Cluster for nodejs development.
  3. For the Build Push, we will modify the tekton code/configuration as needed to point to the external image repo (quay.io or docker.io) as a work around.
  4. To leverage the CNT server tools, we will update the configuration and/or modify the CNT pipeline code in our branch.
  5. Contribution - when we are done, our plan is to create a pull request for our work to be delivered back into the CNT
smcotugno commented 3 years ago

I have setup this branch smc-850-pipeline-ppc64 for working on the changes.