Closed rkomandu closed 7 months ago
@rkomandu Do you have must-gather logs somewhere ? or you can check the logs of one of the csi pod for example oc logs ibm-spectrum-scale-csi-2sfd6 --previous -n ibm-spectrum-scale-csi
it shows as follows.. As you can CNSA didn't had any issue in the Restarts
I0830 06:42:13.599431 1 main.go:48] Version Info: commit (cd7c8950650d7f25ba8c7c3e9ff0fac251cd050e)
I0830 06:42:13.599534 1 gpfs.go:118] gpfs GetScaleDriver
I0830 06:42:13.599541 1 gpfs.go:194] gpfs SetupScaleDriver. name: spectrumscale.csi.ibm.com, version: 2.7.0, nodeID: worker1.sieve.cp.fyre.ibm.com
I0830 06:42:13.599548 1 gpfs.go:236] gpfs PluginInitialize
I0830 06:42:13.599554 1 scale_config.go:94] scale_config LoadScaleConfigSettings
I0830 06:42:13.599724 1 scale_config.go:117] scale_config HandleSecrets
I0830 06:42:13.599788 1 gpfs.go:436] gpfs ValidateScaleConfigParameters.
I0830 06:42:13.599807 1 connectors.go:118] connector GetSpectrumScaleConnector
I0830 06:42:13.599812 1 rest_v2.go:135] rest_v2 NewSpectrumRestV2.
I0830 06:42:13.599820 1 rest_v2.go:158] Created Spectrum Scale connector without SSL mode for ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.sieve.cp.fyre.ibm.com
I0830 06:42:13.599824 1 rest_v2.go:173] rest_v2 GetClusterId
I0830 06:42:13.599835 1 http_utils.go:60] http_utils FormatURL. url: https://ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.sieve.cp.fyre.ibm.com:443/
I0830 06:42:13.599843 1 rest_v2.go:991] rest_v2 doHTTP. endpoint: https://ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.sieve.cp.fyre.ibm.com:443/scalemgmt/v2/cluster, method: GET, param: <nil>
I0830 06:42:13.599847 1 http_utils.go:74] http_utils HttpExecuteUserAuth. type: GET, url: https://ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.sieve.cp.fyre.ibm.com:443/scalemgmt/v2/cluster, user: CsiAdmin
I0830 06:42:13.633722 1 http_utils.go:44] http_utils UnmarshalResponse. response: &{0x584380 0xc0000a0c00 0x634fc0}
I0830 06:42:13.633970 1 rest_v2.go:43] rest_v2 isStatusOK. statusCode: 200
I0830 06:42:13.633985 1 rest_v2.go:237] rest_v2 GetFilesystemMountDetails. filesystemName: remote-sample
I0830 06:42:13.633990 1 rest_v2.go:991] rest_v2 doHTTP. endpoint: https://ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.sieve.cp.fyre.ibm.com:443/scalemgmt/v2/filesystems/remote-sample, method: GET, param: <nil>
I0830 06:42:13.633999 1 http_utils.go:74] http_utils HttpExecuteUserAuth. type: GET, url: https://ibm-spectrum-scale-gui-ibm-spectrum-scale.apps.sieve.cp.fyre.ibm.com:443/scalemgmt/v2/filesystems/remote-sample, user: CsiAdmin
I0830 06:42:13.644583 1 http_utils.go:44] http_utils UnmarshalResponse. response: &{0x584380 0xc00013e240 0x634fc0}
I0830 06:42:13.644786 1 rest_v2.go:43] rest_v2 isStatusOK. statusCode: 400
E0830 06:42:13.644805 1 rest_v2.go:244] Unable to get filesystem details for remote-sample: Remote call completed with error [400 Bad Request]. Error in response [&{[] {400 Invalid value in filesystemName} {}}]
E0830 06:42:13.644815 1 gpfs.go:277] Error in getting filesystem details for remote-sample
E0830 06:42:13.644820 1 gpfs.go:201] Error in plugin initialization: Remote call completed with error [400 Bad Request]. Error in response [&{[] {400 Invalid value in filesystemName} {}}]
F0830 06:42:13.644824 1 main.go:77] Failed to initialize Scale CSI Driver: Remote call completed with error [400 Bad Request]. Error in response [&{[] {400 Invalid value in filesystemName} {}}]
CR file snippet
apiVersion: scale.spectrum.ibm.com/v1beta1
kind: Filesystem
metadata:
labels:
app.kubernetes.io/instance: ibm-spectrum-scale
app.kubernetes.io/name: cluster
name: remote-sample
namespace: ibm-spectrum-scale
spec:
remote:
cluster: remotecluster-sample
fs: fs1
---
apiVersion: scale.spectrum.ibm.com/v1beta1
kind: RemoteCluster
metadata:
labels:
app.kubernetes.io/instance: ibm-spectrum-scale
app.kubernetes.io/name: cluster
name: remotecluster-sample
namespace: ibm-spectrum-scale
Hi Ravi, The CSI driver was restarting because filesystem creation took time and since CSI driver is restarting sidecar also gets restarted.
@deeghuge , Did it also observed in other clusters. if that is the case, worth to document that it would do so IMO.
@deeghuge @rkomandu could you please help put FQI labels ?
Deployed RC4 build on Fyre and haven't come across restarts like before in CSI NS for pods.. .. only the attacher was restarted as shown below , due to Liveness probe ? if this is expected then close this task.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
READINESS GATES
ibm-spectrum-scale-csi-6c5nv 3/3 Running 0 3m7s 10.17.63.249 worker2.sieve.cp.fyre.ibm.com <none>
<none>
ibm-spectrum-scale-csi-attacher-5cfdf6df9-8ldtc 1/1 Running 1 (2m47s ago) 3m7s 10.254.21.162 worker0.sieve.cp.fyre.ibm.com <none>
<none>
ibm-spectrum-scale-csi-attacher-5cfdf6df9-nhwq5 1/1 Running 0 3m7s 10.254.13.148 worker1.sieve.cp.fyre.ibm.com <none>
<none>
ibm-spectrum-scale-csi-bjvpk 3/3 Running 0 3m7s 10.17.59.29 worker1.sieve.cp.fyre.ibm.com <none>
<none>
ibm-spectrum-scale-csi-gl57f 3/3 Running 0 3m7s 10.17.59.26 worker0.sieve.cp.fyre.ibm.com <none>
<none>
ibm-spectrum-scale-csi-operator-5477865bf-7dq7c 1/1 Running 0 18m 10.254.21.157 worker0.sieve.cp.fyre.ibm.com <none>
<none>
ibm-spectrum-scale-csi-provisioner-869fb676b5-vx4pk 1/1 Running 0 3m7s 10.254.13.150 worker1.sieve.cp.fyre.ibm.com <none>
<none>
ibm-spectrum-scale-csi-resizer-6c89db96bc-p2hph 1/1 Running 0 3m7s 10.254.13.151 worker1.sieve.cp.fyre.ibm.com <none>
<none>
ibm-spectrum-scale-csi-snapshotter-76d564cb76-vnhph 1/1 Running 0 3m7s 10.254.13.149 worker1.sieve.cp.fyre.ibm.com <none>
<none>
With latest changes, we are not seeing restart like explained in above issue. Closing now please reopen if you see issue again.
Describe the bug
A clear and concise description of what the bug is.
Have observed the CSI pods have restarted on the fresh deployment of the FQDN build (29th Aug) as per the #scale-cnsa-builds channel
How to Reproduce?
Please list the steps to help development teams reproduce the behavior
https://ibm-systems-storage.slack.com/archives/C02KCKGH755/p1661827704750079
scale-v1-beta file taken from https://github.ibm.com/IBMSpectrumScale/ibm-spectrum-scale-container-native/blob/v5.1.5.0/generated/scale_v1beta1_cluster_cr.yaml
after applying [root@api.sieve.cp.fyre.ibm.com FQDN-29Aug]# oc apply -f ./scale_v1beta1_29Augbuild.yaml namespace/ibm-spectrum-scale created serviceaccount/ibm-spectrum-scale-core created serviceaccount/ibm-spectrum-scale-default created serviceaccount/ibm-spectrum-scale-gui created serviceaccount/ibm-spectrum-scale-pmcollector created role.rbac.authorization.k8s.io/ibm-spectrum-scale-sysmon created rolebinding.rbac.authorization.k8s.io/ibm-spectrum-scale-privileged created rolebinding.rbac.authorization.k8s.io/ibm-spectrum-scale-sysmon created callhome.scale.spectrum.ibm.com/callhome created cluster.scale.spectrum.ibm.com/ibm-spectrum-scale created filesystem.scale.spectrum.ibm.com/remote-sample created remotecluster.scale.spectrum.ibm.com/remotecluster-sample created
Expected behavior
A clear and concise description of what you expected to happen.
They can't restart multiple times on the system when created freshly. Is this expected? Anyone deploys the scale beta file freshly should hit as i understand.
Data Collection and Debugging
Environmental output
kubectl get pods -o wide -n < csi driver namespace>
kubectl get nodes -o wide
./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace> -v
Tool to collect the CSI snap:
./tools/spectrum-scale-driver-snap.sh -n < csi driver namespace>
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
Add labels
Note : See labels for the labels