IBM / ibm-pak

IBM Catalog Management Plug-in for IBM Cloud Paks 1.0
https://www.ibm.com/docs/en/cpfs?topic=environment-catalog-management-plug-in-pak-plugin
Other
13 stars 10 forks source link

amd64 pre-req check fails -- despite all nodes having amd64 arch #26

Closed rolivieri closed 1 year ago

rolivieri commented 1 year ago
-------------Installing catalog source-------------
apiVersion: [operators.coreos.com/v1alpha1](http://operators.coreos.com/v1alpha1)
kind: CatalogSource
metadata:
  name: ibm-kong-management-catalog
  namespace: openshift-marketplace
spec:
  displayName: IBM Kong Management Catalog
  publisher: IBM
  sourceType: grpc
  image: [api.agates.cp.fyre.ibm.com:8443/cpopen/ibm-management-kong-catalog@sha256:6847ec5373b12ba3f6e3d44b92e61804747a8bca5755dcf53b2c48340fd61755](http://api.agates.cp.fyre.ibm.com:8443/cpopen/ibm-management-kong-catalog@sha256:6847ec5373b12ba3f6e3d44b92e61804747a8bca5755dcf53b2c48340fd61755)
[catalogsource.operators.coreos.com/ibm-kong-management-catalog](http://catalogsource.operators.coreos.com/ibm-kong-management-catalog) unchanged
done
[✓] CASE launch script completed successfully
-------------Installing dependent catalog source: /root/.ibm-pak/data/cases/ibm-cp-waiops/1.5.2/ibm-watson-aiops-ui-operator-case-1.5.2.tgz-------------
Welcome to the CASE launcher
Attempting to retrieve and extract the CASE from the specified location
[✓] CASE has been retrieved and extracted
Attempting to validate the CASE
[✓] CASE has been successfully validated
Attempting to locate the launch inventory item, script, and action in the specified CASE
[✓] Found the specified launch inventory item, action, and script for the CASE
Attempting to check the cluster and machine for required prerequisites for launching the item
Checking for required prereqs...

Error: Parsing the actions prereqs failed

Prerequisite                                                                       Result
Kubernetes version is 1.19.0 or greater                                            true
Cluster has at least one amd64 node                                                false
OpenShift Container Platform Kubernetes version is 1.19.0 or greater               true
Client has oc version 4.4.0 or greater                                             true
Client has cloudctl version v3.4.x                                                 true
CustomResourceDefinition must have a group and version of [apiextensions.k8s.io/v1](http://apiextensions.k8s.io/v1)  true

Required prereqs result: FAILED

Additional information on prereqs for the installCatalog action
================================================================
The installCatalog action must be run on either OpenShift Container Platform on amd64 Linux.
The minimum level of Kubernetes on each platform are described in the CASE prerequisites.
The client must have oc installed to execute the launcher script.

[ERROR] installing dependent catalog for '/root/.ibm-pak/data/cases/ibm-cp-waiops/1.5.2/ibm-watson-aiops-ui-operator-case-1.5.2.tgz' failed
Error: Launch script failed due to: exit status 1

Error: * An error was encountered executing process
    - Command: bash
    - Args: -c /root/.ibm-pak/data/launch/ibm-cp-waiops/1.5.2/cpwaiopsSetup/[main-launch.sh](http://main-launch.sh/)
    - Err: exit status 1

However, all worker and master nodes have amd64 architecture. For example:

# oc describe node worker0.agates.cp.fyre.ibm.com
Name:               worker0.agates.cp.fyre.ibm.com
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=worker0.agates.cp.fyre.ibm.com
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.openshift.io/os_id=rhcos
Annotations:        csi.volume.kubernetes.io/nodeid:
                      {"rook-ceph.cephfs.csi.ceph.com":"worker0.agates.cp.fyre.ibm.com","rook-ceph.rbd.csi.ceph.com":"worker0.agates.cp.fyre.ibm.com"}
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-7b2635e55eb299932e92beac07e88c3a
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-7b2635e55eb299932e92beac07e88c3a
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/ssh: accessed
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 10 Mar 2023 10:08:00 -0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  worker0.agates.cp.fyre.ibm.com
  AcquireTime:     <unset>
  RenewTime:       Sat, 11 Mar 2023 09:22:43 -0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Sat, 11 Mar 2023 09:20:53 -0800   Fri, 10 Mar 2023 12:34:07 -0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sat, 11 Mar 2023 09:20:53 -0800   Fri, 10 Mar 2023 12:34:07 -0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Sat, 11 Mar 2023 09:20:53 -0800   Fri, 10 Mar 2023 12:34:07 -0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Sat, 11 Mar 2023 09:20:53 -0800   Fri, 10 Mar 2023 12:34:17 -0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.22.43.218
  Hostname:    worker0.agates.cp.fyre.ibm.com
Capacity:
  cpu:                16
  ephemeral-storage:  261608428Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             56338624Ki
  pods:               250
Allocatable:
  cpu:                15500m
  ephemeral-storage:  240024585022
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             55187648Ki
  pods:               250
System Info:
  Machine ID:                              19dd32f091384c06b83864be309c83f2
  System UUID:                             19dd32f0-9138-4c06-b838-64be309c83f2
  Boot ID:                                 3a1bb452-5929-46bd-b74e-de7a6d2cd57b
  Kernel Version:                          4.18.0-305.76.1.el8_4.x86_64
  OS Image:                                Red Hat Enterprise Linux CoreOS 410.84.202302090253-0 (Ootpa)
  Operating System:                        linux
  Architecture:                            amd64
  Container Runtime Version:               cri-o://1.23.5-5.rhaos4.10.gitd9dec98.el8
  Kubelet Version:                         v1.23.12+8a6bfe4
  Kube-Proxy Version:                      v1.23.12+8a6bfe4
Non-terminated Pods:                       (30 in total)
  Namespace                                Name                                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                                ----                                                               ------------  ----------  ---------------  -------------  ---
  openshift-cluster-node-tuning-operator   tuned-r6ghj                                                        10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         21h
  openshift-console                        downloads-7dc8c885d8-xbnqn                                         10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         20h
  openshift-dns                            dns-default-7wg7w                                                  60m (0%)      0 (0%)      110Mi (0%)       0 (0%)         20h
  openshift-dns                            node-resolver-hshg9                                                5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         21h
  openshift-image-registry                 node-ca-mv6s5                                                      10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         21h
  openshift-ingress-canary                 ingress-canary-2trdk                                               10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         21h
  openshift-ingress                        router-default-564f7ff7cd-jkdkj                                    100m (0%)     0 (0%)      256Mi (0%)       0 (0%)         20h
  openshift-kube-storage-version-migrator  migrator-597c645589-fwd46                                          10m (0%)      0 (0%)      200Mi (0%)       0 (0%)         20h
  openshift-machine-config-operator        machine-config-daemon-68mlk                                        40m (0%)      0 (0%)      100Mi (0%)       0 (0%)         20h
  openshift-monitoring                     alertmanager-main-1                                                9m (0%)       0 (0%)      120Mi (0%)       0 (0%)         20h
  openshift-monitoring                     grafana-646ffd4688-vj7x9                                           6m (0%)       0 (0%)      99Mi (0%)        0 (0%)         20h
  openshift-monitoring                     kube-state-metrics-6694557dd7-dd54m                                4m (0%)       0 (0%)      110Mi (0%)       0 (0%)         20h
  openshift-monitoring                     node-exporter-v9b4j                                                9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         21h
  openshift-monitoring                     openshift-state-metrics-6fb85b7448-5wk26                           3m (0%)       0 (0%)      72Mi (0%)        0 (0%)         20h
  openshift-monitoring                     prometheus-adapter-59468db6cb-2fg9f                                1m (0%)       0 (0%)      40Mi (0%)        0 (0%)         4h13m
  openshift-monitoring                     prometheus-k8s-1                                                   100m (0%)     0 (0%)      1104Mi (2%)      0 (0%)         20h
  openshift-multus                         multus-additional-cni-plugins-twzwm                                10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         21h
  openshift-multus                         multus-gp9j2                                                       10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         21h
  openshift-multus                         network-metrics-daemon-d7m7k                                       20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         21h
  openshift-network-diagnostics            network-check-source-7786b5887f-b2mjv                              10m (0%)      0 (0%)      40Mi (0%)        0 (0%)         20h
  openshift-network-diagnostics            network-check-target-wjvcn                                         10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         21h
  openshift-sdn                            sdn-n7lm5                                                          110m (0%)     0 (0%)      220Mi (0%)       0 (0%)         21h
  rook-ceph                                csi-cephfsplugin-5dj2f                                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         22h
  rook-ceph                                csi-cephfsplugin-provisioner-56f79b4957-dn78v                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         20h
  rook-ceph                                csi-rbdplugin-lrnwx                                                0 (0%)        0 (0%)      0 (0%)           0 (0%)         22h
  rook-ceph                                rook-ceph-crashcollector-worker0.agates.cp.fyre.ibm.com-75prkxk    0 (0%)        0 (0%)      0 (0%)           0 (0%)         20h
  rook-ceph                                rook-ceph-mds-myfs-a-7dcc77b59-fnfpc                               0 (0%)        0 (0%)      0 (0%)           0 (0%)         20h
  rook-ceph                                rook-ceph-mgr-b-85d996c95d-6tz5w                                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         20h
  rook-ceph                                rook-ceph-mon-a-7b44647959-6bnhq                                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         20h
  rook-ceph                                rook-ceph-osd-3-d7cc7d9b6-5t9mm                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         20h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                557m (3%)    0 (0%)
  memory             2879Mi (5%)  0 (0%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-1Gi      0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:              <none>
rolivieri commented 1 year ago

We found the culprit for this problem by looking at the /root/.ibm-pak/logs/case.log file.

The root cause is not a problem in the ibm-pak code. ibm-pak code is working just fine.

Instead, our OCP clusters API is very slow to respond back to clients and that was causing a time-out error. By setting the IBMPAK_HTTP_TIMEOUT environment variable to 120 seconds (default value is 20 seconds), the Cluster has at least one amd64 node pre-req check succeeded and all catalog sources were installed.

Therefore, closing this issue.