Open barochiarg opened 10 months ago
related issue: https://github.com/IBM/cloud-pak-deployer/issues/493
We found that MCG does not get deployed when STS is used for authentication:
time="2023-11-20T08:11:50Z" level=info msg="✅ RPC: system.update_endpoint_group() Response OK: took 0.3ms"
time="2023-11-20T08:11:50Z" level=info msg="✈️ RPC: redirector.register_to_cluster() Request: <nil>"
time="2023-11-20T08:11:50Z" level=info msg="✅ RPC: redirector.register_to_cluster() Response OK: took 0.2ms"
time="2023-11-20T08:11:50Z" level=info msg="❌ Not Found: BackingStore \"noobaa-default-backing-store\"\n"
time="2023-11-20T08:11:50Z" level=info msg="CredentialsRequest \"noobaa-aws-cloud-creds\" created. Creating default backing store on AWS objectstore" func=ReconcileDefaultBackingStore sys=openshift-storage/noobaa
time="2023-11-20T08:11:50Z" level=info msg="❌ Not Found: \"noobaa-aws-cloud-creds-secret\"\n"
time="2023-11-20T08:11:50Z" level=info msg="Secret \"noobaa-aws-cloud-creds-secret\" was not created yet by cloud-credentials operator. retry on next reconcile.." sys=openshift-storage/noobaa
time="2023-11-20T08:11:50Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa
time="2023-11-20T08:11:50Z" level=warning msg="⏳ Temporary Error: cloud credentials secret \"noobaa-aws-cloud-creds-secret\" is not ready yet" sys=openshift-storage/noobaa
time="2023-11-20T08:11:50Z" level=info msg="UpdateStatus: Done generation 2" sys=openshift-storage/noobaa
After some research, this turns out to be the same issue as #310. When trying to provision ODF, the default backing store is not created and the CredentialsRequest does not result in the creation of a secret for the NooBaa operator.
Steps to reproduce the issue:
export AWS_REGION=eu-central-1
export AWS_CFG_DIR=~/aws
export OCP_CLUSTER_NAME=aws-sts
export OCP_DOMAIN_NAME=deployer-demo.eu
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_access_key
mkdir -pv $AWS_CFG_DIR
mkdir -pv $AWS_CFG_DIR/downloads
curl -sLo $AWS_CFG_DIR/downloads/openshift-install-linux.tar.gz https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable-4.12/openshift-install-linux.tar.gz
tar xvzf ${AWS_CONFIG}/downloads/openshift-install-linux.tar.gz -C ~/bin/
In case you want to run the process multiple times, it is best to have a script to reset the AWS credentials to the permanent ones, after which you can generate new temporary credentials.
cat << EOF > $AWS_CFG_DIR/aws-reset-creds.sh
export KUBECONFIG=${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/auth/kubeconfig
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
unset AWS_SESSION_TOKEN
EOF
rm -rf $AWS_CFG_DIR/$OCP_CLUSTER_NAME
mkdir -pv $AWS_CFG_DIR/$OCP_CLUSTER_NAME
source $AWS_CFG_DIR/aws-reset-creds.sh
printf "\nexport AWS_ACCESS_KEY_ID=%s\nexport AWS_SECRET_ACCESS_KEY=%s\nexport AWS_SESSION_TOKEN=%s\n" $(aws sts assume-role \
--role-arn arn:aws:iam::872255850422:role/fk-sts-role \
--role-session-name OCPInstall \
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
--output text) > /tmp/sts-credentials.sh
source /tmp/sts-credentials.sh
RELEASE_IMAGE=$(openshift-install version | awk '/release image/ {print $3}') && echo "Release image: ${RELEASE_IMAGE}"
CCO_IMAGE=$(oc adm release info --image-for='cloud-credential-operator' $RELEASE_IMAGE -a /tmp/ocp_pullsecret.json) && echo $CCO_IMAGE
pushd ~/bin
oc image extract $CCO_IMAGE --file="/usr/bin/ccoctl" -a /tmp/ocp_pullsecret.json
popd
chmod 775 ~/bin/ccoctl
oc adm release extract --credentials-requests --cloud=aws --to=${AWS_CFG_DIR}/credrequests --from=$RELEASE_IMAGE
ccoctl aws create-all --name=${OCP_CLUSTER_NAME} --region=${AWS_REGION} --credentials-requests-dir=${AWS_CFG_DIR}/credrequests --output-dir=${AWS_CFG_DIR}/credoutput
mkdir -p ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}
cat << EOF > ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/install-config.yaml
apiVersion: v1
baseDomain: ${OCP_DOMAIN_NAME}
credentialsMode: Manual
metadata:
name: ${OCP_CLUSTER_NAME}
controlPlane:
hyperthreading: Enabled
name: master
platform:
aws:
type: m5.xlarge
zones:
- ${AWS_REGION}a
replicas: 3
compute:
- hyperthreading: Enabled
name: worker
platform:
aws:
type: m5.4xlarge
zones:
- ${AWS_REGION}a
replicas: 3
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
aws:
region: ${AWS_REGION}
fips: false
pullSecret: '$(cat /tmp/ocp_pullsecret.json)'
sshKey: $(cat ~/.ssh/id_rsa.pub)
EOF
pushd ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}
openshift-install create manifests
popd
cp ${AWS_CFG_DIR}/credoutput/manifests/* ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/manifests
cp -r ${AWS_CFG_DIR}/credoutput/tls ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}
openshift-install create cluster --dir=${AWS_CFG_DIR}/${OCP_CLUSTER_NAME} --log-level=debug
export KUBECONFIG=${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/auth/kubeconfig
oc create ns openshift-storage
cat << EOF | oc apply -f -
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: openshift-storage
namespace: openshift-storage
spec:
targetNamespaces:
- openshift-storage
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
labels:
operators.coreos.com/ocs-operator.openshift-storage: ""
name: odf-operator
namespace: openshift-storage
spec:
channel: stable-4.12
installPlanApproval: Automatic
name: odf-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
Now wait until the OpenShift Data Foundation operator is ready.
watch "oc get csv -n openshift-storage -l operators.coreos.com/ocs-operator.openshift-storage --no-headers -o custom-columns='name:metadata.name,phase:status.phase'"
Patch OpenShift console
oc patch console.operator cluster \
-n openshift-storage \
--type json \
-p '[{"op": "add", "path": "/spec/plugins", "value": ["odf-console"]}]'
cat << EOF | oc apply -f -
---
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
annotations:
uninstall.ocs.openshift.io/cleanup-policy: delete
uninstall.ocs.openshift.io/mode: graceful
name: ocs-storagecluster
namespace: openshift-storage
spec:
multiCloudGateway:
dbStorageClassName: gp3-csi
reconcileStrategy: standalone
EOF
Wait for the Storagecluster to reconcile. It never does because it fails to create the backingstore.
cat << EOF | oc apply -f -
apiVersion: noobaa.io/v1alpha1
kind: BackingStore
metadata:
name: noobaa-default-backing-store
namespace: openshift-storage
spec:
pvPool:
numVolumes: 1
resources:
requests:
storage: 100Gi
secret: {}
storageClass: gp3-csi
type: pv-pool
EOF
echo "Go to console: https://$(oc get route --no-headers -n openshift-console console -o custom-columns='host:.spec.host')"
echo "Log in as kubeadmin, password $(cat ${AWS_CFG_DIR}/${OCP_CLUSTER_NAME}/auth/kubeadmin-password)"
openshift-install destroy cluster --dir=${AWS_CFG_DIR}/${OCP_CLUSTER_NAME} --log-level=debug
We found a way to work around the current issue by creating the backingstore that is expected by the StorageCluster. The backingstore will be based on a PVC instead of AWS S3. This is not ideal, but will help us to progress with the provisioning of MCG.
This has been resolved by using OpenShift 4.14.
Issued reopened. The StorageCluster
does get to a Ready
state in OpenShift 4.14, but the BackingStorage
stays in the BackingStorePhaseRejected
state and no bucket is created for the cluster, meaning that any attempt to access the bucket fails.
Need to make the following changes:
Create namespace with the correct label.
apiVersion: v1
kind: Namespace
metadata:
labels:
openshift.io/cluster-monitoring: "true"
name: openshift-storage
Update CredentialsRequest to work with ServiceAccount:
oc get credentialsrequest -n openshift-storage noobaa-aws-cloud-creds -o yaml > nooba-credreq.yaml
NOOBA_BUCKET=$(cat nooba-credreq.yaml|grep arn:aws:s3:::|head -1|awk -F: '{print $7}')
# add the following at the end
# serviceAccountName:
# - noobaa
ccoctl aws create-iam-roles --name="${OCP_CLUSTER_NAME}" --region="${AWS_REGION}" --credentials-requests-dir=. --identity-provider-arn=arn:aws:iam:: 872255850422:oidc-provider/${OCP_CLUSTER_NAME}-oidc.s3.${AWS_REGION}.amazonaws.com
aws s3api create-bucket --bucket ${NOOBA_BUCKET} --region ${AWS_REGION} --create-bucket-configuration LocationConstraint=${AWS_REGION}
Create BackingStore:
cat <<EOF | oc apply -f -
apiVersion: noobaa.io/v1alpha1
kind: BackingStore
metadata:
finalizers:
- noobaa.io/finalizer
labels:
app: noobaa
name: noobaa-default-backing-store
namespace: openshift-storage
spec:
awsS3:
awsSTSRoleARN: arn:aws:iam:: 872255850422:oidc-provider/${OCP_CLUSTER_NAME}-oidc.s3.${AWS_REGION}.amazonaws.com
targetBucket: ${NOOBA_BUCKET}
secret:
name: noobaa-aws-cloud-creds-secret
namespace: openshift-storage
pvPool:
numVolumes: 1
resources:
requests:
storage: 50Gi
secret: {}
storageClass: gp3-csi
type: pv-pool
EOF
Hi @fketelaars , is there any update on MCG issue? There are certain watsonx (WA, watsonx.ai, watsox.data) PoC request on HCP. Due to MCG issue, we are not able to move on this.
Describe the bug Watson Assistance installation is failed
To Reproduce While installing the watson-assistance using cloud-pak-deployer on AWS environment, installation is failed with below message. It's also not creating namespace "openshift-storage". Same issue may present for watson-discovery too.
TASK [cp4d-cartridge-install : Set up Multicloud Object Gateway (MCG) secrets for watson_assistant in CP4D project cpd, logs are in /home/ec2-user/cpd-status/log/cpd-watson_assistant-setup-mcg.log] * Thursday 09 November 2023 07:34:20 +0000 (0:00:00.051) 0:26:42.451 *** fatal: [localhost]: FAILED! => {"changed": true, "cmd": "set -o pipefail\nsetup-mcg \\n --components=watson_assistant \\n --cpd_instance_ns=cpd \\n --noobaa_account_secret=noobaa-admin \\n --noobaa_cert_secret=noobaa-s3-serving-cert | tee /home/ec2-user/cpd-status/log/cpd-watson_assistant-setup-mcg.log\n", "delta": "0:00:00.147544", "end": "2023-11-09 07:34:21.267867", "msg": "non-zero return code", "rc": 1, "start": "2023-11-09 07:34:21.120323", "stderr": "Error from server (NotFound): secrets \"noobaa-admin\" not found", "stderr_lines": ["Error from server (NotFound): secrets \"noobaa-admin\" not found"], "stdout": "Running the setup for the watson_assistant component using the cpd project.", "stdout_lines": ["Running the setup for the watson_assistant component using the cpd project."]}
PLAY RECAP ***** localhost : ok=1235 changed=148 unreachable=0 failed=1 skipped=575 rescued=0 ignored=0
Thursday 09 November 2023 07:34:21 +0000 (0:00:00.411) 0:26:42.862 *****
cp4d-scheduling-service : Run scheduler installation script, output can be found in /home/ec2-user/cpd-status/log/cpd-apply-scheduler.log - 308.72s cp4d-cluster : Run script to setup instance topology, output can be found in /home/ec2-user/cpd-status/log/cpd-setup-instance-topology.log - 205.80s cp4d-subscriptions : Run apply-olm command to install cartridge subscriptions, logs are in /home/ec2-user/cpd-status/log/cpd-apply-olm-cartridge-sub.log - 183.72s cp-fs-cluster-components : Run shell script to apply cluster components, logs are in /home/ec2-user/cpd-status/log/cpd-apply-cluster-components.log - 176.57s cp4d-catalog-source : Run apply-olm command to create catalog sources, logs are in /home/ec2-user/cpd-status/log/apply-olm-create-catsrc.log - 173.82s cp4d-catalog-source : Generate preview script to create catalog sources, logs are in /home/ec2-user/cpd-status/log/apply-olm-create-catsrc.log - 102.04s cp4d-subscriptions : Generate preview script to install cartridge subscriptions, logs are in /home/ec2-user/cpd-status/log/cpd-apply-olm-cartridge-sub.log -- 30.60s cp4d-cluster : Run apply-cr command to install Cloud Pak for Data platform, logs are in /home/ec2-user/cpd-status/log/cpd-apply-cr-cpd-platform.log -- 24.82s cp4d-cluster : Run script to authorize instance, output can be found in /home/ec2-user/cpd-status/log/cpd-authorize-instance.log -- 17.93s cp4d-cluster : Generate preview script to install Cloud Pak for Data platform, logs are in /home/ec2-user/cpd-status/log/cpd-apply-cr-cpd-platform.log -- 15.52s openshift-download-installer : Unpack OpenShift installer -------------- 15.39s cpd-cli-download : Unpack cpd-cli from /home/ec2-user/cpd-status/downloads/cpd-cli-linux-amd64.tar.gz -- 12.66s aws-download-cli : Unpack aws-cli client installer ---------------------- 7.72s openshift-download-client : Unpack OpenShift client from /home/ec2-user/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 5.20s openshift-download-client : Unpack OpenShift client from /home/ec2-user/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 3.38s ibm-pak-download : Extract ibm-pak from /home/ec2-user/cpd-status/downloads/oc-ibm_pak-linux-amd64.tar.gz --- 3.36s openshift-download-client : Unpack OpenShift client from /home/ec2-user/cpd-status/downloads/openshift-client-linux.tar.gz-4.12 --- 3.26s cloudctl-download : Unpack cloudctl from /home/ec2-user/cpd-status/downloads/cloudctl-linux-amd64.tar.gz --- 3.03s cp4d-cluster : Run apply-entitlement command ---------------------------- 2.62s cp4d-variables : Add versions details from olm-utils -------------------- 2.60s
==================================================================================== Deployer FAILED. Check previous messages. If command line is not returned, press ^C.
Expected behavior WA should install successfully.
Desktop (please complete the following information): AWS environment - self-managed and ROSA openshift