IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

Remote Cluster - Fail to connect to cluster with error "Unable to get cluster ID: json.Unmarshal failed invalid character 'p' after top-level value" #1031

Closed superleo closed 1 year ago

superleo commented 1 year ago

Describe the bug

Fail to connect to cluster with error "Unable to get cluster ID: json.Unmarshal failed invalid character 'p' after top-level value"

How to Reproduce?

Please list the steps to help development teams reproduce the behavior

  1. In a remote cluster env. Deploy refer to: https://www.ibm.com/docs/en/spectrum-scale-csi?topic=operator-remote-cluster-support
  2. Use blow yaml file to deplooy csi driver: `cat gpfs.yaml

    apiVersion: csi.ibm.com/v1 kind: "CSIScaleOperator" metadata: name: "ibm-spectrum-scale-csi" namespace: "ibm-spectrum-scale-csi-driver" labels: app.kubernetes.io/name: ibm-spectrum-scale-csi-operator app.kubernetes.io/instance: ibm-spectrum-scale-csi-operator app.kubernetes.io/managed-by: ibm-spectrum-scale-csi-operator release: ibm-spectrum-scale-csi-operator status: {} spec:

    clusters:

    • id: "1052530068846563007" secrets: "accessgpfssceret" secureSslMode: false primary: primaryFs: "101" remoteCluster: "17121353980023427834" restApi:
      • guiHost: "host1111"
    • id: '17121353980023427834' secrets: "owneinggpfssceret" secureSslMode: false restApi:
      • guiHost: "GPFS-22222.local" # Multiple GUIs can be provided here also similar to primary cluster. attacherNodeSelector:
    • key: "scale" value: "true" provisionerNodeSelector:
    • key: "scale" value: "true" pluginNodeSelector:
    • key: "scale" value: "true" snapshotterNodeSelector:
    • key: "scale" value: "true" resizerNodeSelector:
    • key: "scale" value: "true" ---`

Expected behavior

It should connect successfully

Data Collection and Debugging

2023-09-26T11:31:23.901Z INFO csiscaleoperator_controller.Reconcile CSI setup started. 2023-09-26T11:31:23.902Z INFO csiscaleoperator_controller.Reconcile Fetching CSIScaleOperator instance. 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.Reconcile Adding Finalizer 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.getAccessorAndFinalizerName Got finalizer with {"name": "finalizer.csiscaleoperators.csi.ibm.com"} 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.addFinalizerIfNotPresent Finalizer was added with {"name": "finalizer.csiscaleoperators.csi.ibm.com"} 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.Reconcile Checking if CSIScaleOperator object got deleted 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.checkPrerequisite Checking pre-requisites. 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.resourceExists Checking resource exists {"Kind": "Secret", "Name": "accessgpfssceret"} 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.checkPrerequisite Secret resource accessgpfssceret found. 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.resourceExists Checking resource exists {"Kind": "Secret", "Name": "gpfs101sceret"} 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.checkPrerequisite Secret resource gpfs101sceret found. 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.Reconcile Pre-requisite check passed. 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.ValidateCRParams Validating the Spectrum Scale CSI configurations of the resource CSIScaleOperator/ibm-spectrum-scale-csi 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.Reconcile The Spectrum Scale CSI configurations are validated successfully 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.handleSpectrumScaleConnectors Checking spectrum scale connectors 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.newSpectrumScaleConnector Creating new SpectrumScaleConnector for cluster with {"ID": "1052530068846563007"} 2023-09-26T11:31:23.919Z INFO csiscaleoperator_controller.newSpectrumScaleConnector Created Spectrum Scale connector without SSL mode for guiHost(s) E0926 11:31:23.936856 1 rest_v2.go:193] [] Unable to get cluster ID: json.Unmarshal failed invalid character 'p' after top-level value 2023-09-26T11:31:23.936Z ERROR csiscaleoperator_controller.handleSpectrumScaleConnectors Failed to connect to the GUI of the cluster with ID: 1052530068846563007 {"error": "json.Unmarshal failed invalid character 'p' after top-level value"} github.com/IBM/ibm-spectrum-scale-csi/operator/controllers.(*CSIScaleOperatorReconciler).Reconcile /workspace/controllers/csiscaleoperator_controller.go:296 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227 2023-09-26T11:31:23.937Z ERROR csiscaleoperator_controller.Reconcile Error in getting connectors {"error": "json.Unmarshal failed invalid character 'p' after top-level value"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227 2023-09-26T11:31:23.937Z INFO csiscaleoperator_controller.SetStatus Assigning values to status sub-resource object. 2023-09-26T11:31:23.937Z DEBUG csiscaleoperator_controller.SetStatus Setting status of CSIScaleOperator is successful 2023-09-26T11:31:23.937Z DEBUG events Warning {"object": {"kind":"CSIScaleOperator","namespace":"ibm-spectrum-scale-csi-driver","name":"ibm-spectrum-scale-csi","uid":"23bd3bf3-ab2c-487a-8b3e-418743606649","apiVersion":"csi.ibm.com/v1","resourceVersion":"10838226"}, "reason": "GUIConnFailed", "message": "Failed to connect to the GUI of the cluster with ID: 10525300????"} 2023-09-26T11:31:23.950Z DEBUG csiscaleoperator_controller.Reconcile Updated resource status. {"Status": {"versions":[{"name":"ibm-spectrum-scale-csi","version":"2.9.0"}],"conditions":[{"type":"Success","status":"False","lastTransitionTime":"2023-09-26T11:31:23Z","reason":"GUIConnFailed","message":"Failed to connect to the GUI of the cluster with ID: 105253???????"}]}} 2023-09-26T11:31:23.950Z ERROR controller.csiscaleoperator Reconciler error {"reconciler group": "csi.ibm.com", "reconciler kind": "CSIScaleOperator", "name": "ibm-spectrum-scale-csi", "namespace": "ibm-spectrum-scale-csi-driver", "error": "json.Unmarshal failed invalid character 'p' after top-level value"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227

Environmental output

Tool to collect the CSI snap:

./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace>

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Add labels

Note : See labels for the labels

Jainbrt commented 1 year ago

@superleo can you please check if your GUI/REST API interface is working fine for both clusters?

superleo commented 1 year ago

@superleo can you please check if your GUI/REST API interface is working fine for both clusters?

thankyou Jainbrt, I can login the GUI/REST server and could curl successfully. See blow:

the acccessing cluster GUI: # curl --insecure -u 'user???:pass???' -X GET https://192.168.3.211:30443/scalemgmt/v2/cluster { "cluster" : { "clusterSummary" : { "clusterId" : 1052530068846563007, "clusterName" : "master1.cluster.local", "primaryServer" : "master1.cluster.local", "rcpPath" : "/usr/bin/scp", "rcpSudoWrapper" : false, "repositoryType" : "CCR", "rshPath" : "/usr/bin/ssh", "rshSudoWrapper" : false, "uidDomain" : "master1.cluster.local" } }, "status" : { "code" : 200, "message" : "The request finished successfully." }

the ownning cluster GUI:

`

curl --insecure -u 'user:????' -X GET https://10.10.1.32:443/scalemgmt/v2/cluster

{ "cluster" : { "clusterSummary" : { "clusterId" : 17121353980023427834, "clusterName" : "GPFS-???.local", "primaryServer" : "GPFS-???.local", "rcpPath" : "/usr/bin/scp", "rcpSudoWrapper" : false, "repositoryType" : "CCR", "rshPath" : "/usr/bin/ssh", "rshSudoWrapper" : false, "uidDomain" : "GPFS-???.local" }, "capacityLicensing" : { "liableCapacity" : 439804651110400, "liableNsdCount" : 8, "liableNsds" : [ { "nsdName" : "nsd1", "liableCapacity" : 54975581388800 }, { "nsdName" : "nsd2", "liableCapacity" : 54975581388800 }, { "nsdName" : "nsd3",

`

superleo commented 1 year ago

it seems a json fromat problem related with tab/space, maybe a python lib problem?

superleo commented 1 year ago

resolved by recreate the yaml file.