IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
65 stars 49 forks source link

[MDD] Testing CSI Support on Ubuntu 16.04.6 LTS #123

Closed mew2057 closed 4 years ago

mew2057 commented 4 years ago

A smoke test of the CSI driver and operator must be preformed to verify that they are fully functional on Ubuntu 16.04.6 LTS.

Verification process:

mew2057 commented 4 years ago
root@jdun-ub-1:~# cat /etc/*rele*
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
mew2057 commented 4 years ago
root@jdun-ub-1:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
mew2057 commented 4 years ago
root@jdun-ub-1:~# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         jdun-ub-1.fyre.ibm.com
  GPFS cluster id:           16807700128967085992
  GPFS UID domain:           jdun-ub-1.fyre.ibm.com
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name        IP address      Admin node name         Designation
-----------------------------------------------------------------------------------
   1   jdun-ub-2.fyre.ibm.com  172.16.196.66   jdun-ub-2.fyre.ibm.com  quorum-manager-perfmon
   2   jdun-ub-3.fyre.ibm.com  172.16.238.175  jdun-ub-3.fyre.ibm.com  quorum-manager-perfmon
   3   jdun-ub-1.fyre.ibm.com  172.16.194.157  jdun-ub-1.fyre.ibm.com  quorum-perfmon
mew2057 commented 4 years ago
root@jdun-ub-1:~# curl https://raw.githubusercontent.com/IBM/ibm-spectrum-scale-csi/v1.1.0/generated/installer/ibm-spectrum-scale-csi-operator-dev.yaml > ibm-spectrum-scale-csi-operator-dev.yaml
root@jdun-ub-1:~# kubectl apply namespace  ibm-spectrum-scale-csi-driver
root@jdun-ub-1:~# kubectl apply -f ibm-spectrum-scale-csi-operator-dev.yaml
root@jdun-ub-1:~# kubectl get pods -n ibm-spectrum-scale-csi-driver
NAME                                               READY   STATUS    RESTARTS   AGE
ibm-spectrum-scale-csi-operator-5f5f549754-n4flh   2/2     Running   0          17h
mew2057 commented 4 years ago
root@jdun-ub-1:~# cat ibm-spectrum-scale-csi-operator-cr.yaml
apiVersion: csi.ibm.com/v1
kind: 'CSIScaleOperator'
metadata:
    name: 'ibm-spectrum-scale-csi'
    namespace: 'ibm-spectrum-scale-csi-driver'
    labels:
      app.kubernetes.io/name: ibm-spectrum-scale-csi-operator
      app.kubernetes.io/instance: ibm-spectrum-scale-csi-operator
      app.kubernetes.io/managed-by: ibm-spectrum-scale-csi-operator
      release: ibm-spectrum-scale-csi-operator
status: {}
spec:
# The path to the GPFS file system mounted on the host machine.
# ==================================================================================
  scaleHostpath: "/ibm/fs1"
  attacher: 'quay.io/k8scsi/csi-attacher:v1.0.0'
  provisioner: 'quay.io/k8scsi/csi-provisioner:v1.5.0'
  driverRegistrar: "quay.io/k8scsi/csi-node-driver-registrar:v1.0.1"
  #spectrumScale: "quay.io/ibm-spectrum-scale/ibm-spectrum-scale-csi-driver:v1.0.0"
  nodeMapping:
    - k8sNode: "jdun-ub-1"
      spectrumscaleNode: "jdun-ub-1.fyre.ibm.com"
    - k8sNode: "jdun-ub-2"
      spectrumscaleNode: "jdun-ub-2.fyre.ibm.com"
    - k8sNode: "jdun-ub-3"
      spectrumscaleNode: "jdun-ub-3.fyre.ibm.com"
  clusters:
    - id: "16807700128967085992"
      secrets: "csisecret"
      secureSslMode: false
      primary:
        primaryFs: "fs1"
        primaryFset: "csiFset1"
      restApi:
        - guiHost: "10.233.0.1"
mew2057 commented 4 years ago
root@jdun-ub-1:~# kubectl apply -f csisecret.yaml
root@jdun-ub-1:~# kubectl apply -f ibm-spectrum-scale-csi-operator-cr.yaml
mew2057 commented 4 years ago

At this point I hit an issue in my GUI config:

root@jdun-ub-1:~# kubectl get pods -n ibm-spectrum-scale-csi-driver
NAME                                               READY   STATUS             RESTARTS   AGE
ibm-spectrum-scale-csi-79bxx                       1/2     CrashLoopBackOff   2          34s
ibm-spectrum-scale-csi-attacher-0                  1/1     Running            0          36s
ibm-spectrum-scale-csi-flm9s                       1/2     CrashLoopBackOff   2          34s
ibm-spectrum-scale-csi-operator-5f5f549754-n4flh   2/2     Running            0          17h
ibm-spectrum-scale-csi-provisioner-0               1/1     Running            0          35s
mew2057 commented 4 years ago

Looks like something is going wrong in the authentication, pretty sure this is a configuration problem, not an OS problem:

root@jdun-ub-1:~# kubectl logs -n ibm-spectrum-scale-csi-driver   ibm-spectrum-scale-csi-flm9s ibm-spectrum-scale-csi
I0304 15:34:56.460480       1 gpfs.go:58] gpfs GetScaleDriver
I0304 15:34:56.460541       1 gpfs.go:134] gpfs SetupScaleDriver. name: spectrumscale.csi.ibm.com, version: 1.1.0, nodeID: jdun-ub-2
I0304 15:34:56.460547       1 gpfs.go:171] gpfs PluginInitialize
I0304 15:34:56.460559       1 scale_config.go:104] scale_config LoadScaleConfigSettings
I0304 15:34:56.460951       1 scale_config.go:127] scale_config HandleSecrets
I0304 15:34:56.461000       1 gpfs.go:371] gpfs ValidateScaleConfigParameters.
I0304 15:34:56.461097       1 connectors.go:74] connector GetSpectrumScaleConnector
I0304 15:34:56.461113       1 rest_v2.go:117] rest_v2 NewSpectrumRestV2.
I0304 15:34:56.461122       1 rest_v2.go:139] Created Spectrum Scale connector without SSL mode for 10.233.0.1
I0304 15:34:56.461127       1 rest_v2.go:154] rest_v2 GetClusterId
I0304 15:34:56.461136       1 http_utils.go:60] http_utils FormatURL. url: https://10.233.0.1:443/
I0304 15:34:56.461152       1 rest_v2.go:560] rest_v2 doHTTP. endpoint: https://10.233.0.1:443/scalemgmt/v2/cluster, method: GET, param: <nil>
I0304 15:34:56.461165       1 http_utils.go:74] http_utils HttpExecuteUserAuth. type: GET, url: https://10.233.0.1:443/scalemgmt/v2/cluster, user: admin1
I0304 15:34:56.468624       1 http_utils.go:44] http_utils UnmarshalResponse. response: &{0x6dad10 0xc000154040 0x6e7e70}
E0304 15:34:56.468999       1 rest_v2.go:161] Unable to get cluster ID: json.Unmarshal failed json: cannot unmarshal string into Go struct field GetClusterResponse.status of type connectors.Status
E0304 15:34:56.469013       1 gpfs.go:195] Error getting cluster ID: json.Unmarshal failed json: cannot unmarshal string into Go struct field GetClusterResponse.status of type connectors.Status
E0304 15:34:56.469017       1 gpfs.go:141] Error in plugin initialization: json.Unmarshal failed json: cannot unmarshal string into Go struct field GetClusterResponse.status of type connectors.Status
F0304 15:34:56.469021       1 main.go:61] Failed to initialize Scale CSI Driver: json.Unmarshal failed json: cannot unmarshal string into Go struct field GetClusterResponse.status of type connectors.Status
mew2057 commented 4 years ago

I think I'm using the wrong address:

root@jdun-ub-1:~# /usr/lpp/mmfs/gui/cli/lsnode
Hostname               IP             Description     Role       Product version Connection status GPFS status Last updated
jdun-ub-1.fyre.ibm.com 172.16.194.157 Master GUI Node management 5.0.4.1         HEALTHY           HEALTHY     3/4/20 7:38 AM
jdun-ub-2.fyre.ibm.com 172.16.196.66                  storage    5.0.4.1         HEALTHY           HEALTHY     3/4/20 7:38 AM
jdun-ub-3.fyre.ibm.com 172.16.238.175                 storage    5.0.4.1         HEALTHY           HEALTHY     3/4/20 7:38 AM
mew2057 commented 4 years ago
#!/bin/bash
echo "Stopping GUI..."
systemctl stop gpfsgui
echo "Cleaning database..."
psql postgres postgres -c "drop schema fscc cascade;"
echo "Cleaning CCR files..."
mmccr fdel _gui.settings
mmccr fdel _gui.user.repo
mmccr fdel _gui.keystore_settings
mmccr fdel _gui.policysettings
mmccr fdel _gui.dashboards
mmccr fdel _gui.notification
mmccr fdel gui_jobs
mmccr fdel gui
echo "Cleaning local CCR files..."
rm -f /var/lib/mmfs/gui/*.json*
echo "Cleaning logs..."
rm -rf /var/log/cnlog/mgtsrv/*
echo "Deleting callbacks..."
mmdelcallback GUI_CCR_CHANGE,GUI_CM_TAKEOVER,GPFS_STARTUP_SHUTDOWN,GUI_NODES,GUI_DISK_SPACE,GUI_MOUNT_ACTION
mmdelnodeclass GUI_MGMT_SERVERS,GUI_SERVERS

Removing GUI and trying again.

root@jdun-ub-1:/usr/lpp/mmfs/5.0.4.1/installer# ./spectrumscale deploy
...

Still hitting problems, examined the journalctl, looks like a bad user?

Mar 04 07:57:45 jdun-ub-1 sudo[8328]: pam_unix(sudo:auth): auth could not identify password for [scalemgmt]
Mar 04 07:57:45 jdun-ub-1 sudo[8328]: scalemgmt : command not allowed ; TTY=unknown ; PWD=/opt/ibm/wlp ; USER=root ; COMMAND=mmvdisk recoverygroup list
mew2057 commented 4 years ago

Found this SO thread: https://stackoverflow.com/questions/57657645/pam-unixsudoauth-conversation-failed-auth-could-not-identify-password-for

Adding auth sufficient pam_permit.so to /etc/pam.d/sudo and restarting gui.

mew2057 commented 4 years ago

Didn't fix anything.

mew2057 commented 4 years ago
Mar 04 08:15:34 jdun-ub-1 java[4579]: file: /var/lib/mmfs/gui/user.repo.json
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err] com.ibm.websphere.security.CustomRegistryException: Key com.ibm.fscc.common.config.ccr.CliUser not found in
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at userrepobundle.FsccUserRepository.updateCacheFromJsonFile(FsccUserRepository.java:100)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at userrepobundle.FsccUserRepository.initialize(FsccUserRepository.java:79)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at userrepobundle.FsccUserRepositoryActivator.updated(FsccUserRepositoryActivator.java:98)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at com.ibm.ws.config.admin.internal.ManagedServiceTracker$1.run(ManagedServiceTracker.java:272)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.FutureTask.run(FutureTask.java:277)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at com.ibm.ws.config.admin.internal.UpdateQueue$Queue.run(UpdateQueue.java:66)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.FutureTask.run(FutureTask.java:277)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPo
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecu
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.lang.Thread.run(Thread.java:818)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err] com.ibm.websphere.security.CustomRegistryException: Key com.ibm.fscc.common.config.ccr.CliUser not found in
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at userrepobundle.FsccUserRepository.updateCacheFromJsonFile(FsccUserRepository.java:100)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at userrepobundle.FsccUserRepository.initialize(FsccUserRepository.java:79)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at userrepobundle.FsccUserRepositoryActivator.updated(FsccUserRepositoryActivator.java:98)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at com.ibm.ws.config.admin.internal.ManagedServiceTracker$1.run(ManagedServiceTracker.java:272)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.FutureTask.run(FutureTask.java:277)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at com.ibm.ws.config.admin.internal.UpdateQueue$Queue.run(UpdateQueue.java:66)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.FutureTask.run(FutureTask.java:277)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPo
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecu
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
Mar 04 08:15:34 jdun-ub-1 java[4579]: [err]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)

Looks like this is my real error.

mew2057 commented 4 years ago

Ran /usr/lpp/mmfs/gui/cli/mkuser ${USERNAME} -p ${PASSWORD} -g CsiAdmin again and generated an admin1 user.

mew2057 commented 4 years ago

And another thing:

Mar 04 09:06:08 jdun-ub-1 systemd[1]: gpfsgui.service: Watchdog timeout (limit 1min)!
Mar 04 09:06:08 jdun-ub-1 java[18063]: JVMDUMP039I Processing dump event "abort", detail "" at 2020/03/04 09:06:08 - please w
Mar 04 09:06:08 jdun-ub-1 java[18063]: IBM Java[18063]: JVMDUMP039I Processing dump event "abort", detail "" at 2020/03/04 09
Mar 04 09:06:08 jdun-ub-1 java[18063]: JVMDUMP032I JVM requested System dump using '/var/crash/scalemgmt/core.20200304.090608
Mar 04 09:06:08 jdun-ub-1 java[18063]: IBM Java[18063]: JVMDUMP032I JVM requested System dump using '/var/crash/scalemgmt/cor
Mar 04 09:06:09 jdun-ub-1 java[18063]: JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c
Mar 04 09:06:10 jdun-ub-1 java[18063]: JVMDUMP010I System dump written to /var/crash/scalemgmt/core.20200304.090608.18063.000
Mar 04 09:06:10 jdun-ub-1 java[18063]: JVMDUMP032I JVM requested Java dump using '/var/crash/scalemgmt/javacore.20200304.0906
Mar 04 09:06:10 jdun-ub-1 java[18063]: IBM Java[18063]: JVMDUMP032I JVM requested Java dump using '/var/crash/scalemgmt/javac
Mar 04 09:06:10 jdun-ub-1 kubelet[14349]: I0304 09:06:10.171959   14349 setters.go:73] Using node IP: "172.16.194.157"
Mar 04 09:06:10 jdun-ub-1 java[18063]: JVMDUMP010I Java dump written to /var/crash/scalemgmt/javacore.20200304.090608.18063.0
Mar 04 09:06:10 jdun-ub-1 java[18063]: JVMDUMP032I JVM requested Snap dump using '/var/crash/scalemgmt/Snap.20200304.090608.1
Mar 04 09:06:10 jdun-ub-1 java[18063]: IBM Java[18063]: JVMDUMP032I JVM requested Snap dump using '/var/crash/scalemgmt/Snap.
Mar 04 09:06:10 jdun-ub-1 java[18063]: JVMDUMP010I Snap dump written to /var/crash/scalemgmt/Snap.20200304.090608.18063.0003.
Mar 04 09:06:10 jdun-ub-1 java[18063]: JVMDUMP007I JVM Requesting JIT dump using '/var/crash/scalemgmt/jitdump.20200304.09060
Mar 04 09:06:10 jdun-ub-1 java[18063]: JVMDUMP010I JIT dump written to /var/crash/scalemgmt/jitdump.20200304.090608.18063.000
Mar 04 09:06:10 jdun-ub-1 java[18063]: JVMDUMP013I Processed dump event "abort", detail "".
mew2057 commented 4 years ago

Tried to reinstall the GUI again and I still can't get this to work. Does the GUI have known issues on Ubuntu?

mew2057 commented 4 years ago

https://bugzilla.redhat.com/show_bug.cgi?id=1270616

mew2057 commented 4 years ago

Okay, well this is interesting:

root@jdun-ub-1:~# cat /usr/lib/systemd/system/gpfsgui.service
##############################################################################
#
# Licensed Materials - Property of IBM
#
# (C) COPYRIGHT International Business Machines Corp. 2014, 2019
# All Rights Reserved
#
# US Government Users Restricted Rights - Use, duplication or
# disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
#
##############################################################################
[Unit]
Description=IBM_Spectrum_Scale Administration GUI
After=syslog.target network.target postgresql.service gpfs.service
Wants=postgresql.service

[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/gpfsgui
WorkingDirectory=/opt/ibm/wlp
ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4pgsql
ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4iptables
ExecStartPre=/usr/lpp/mmfs/gui/bin-sudo/check4sudoers
ExecStartPre=-/usr/lpp/mmfs/gui/bin-sudo/cleanupdumps
ExecStart=/usr/lpp/mmfs/java/jre/bin/java -XX:+HeapDumpOnOutOfMemoryError -Djava.library.path=/opt/ibm/wlp/usr/servers/gpfsgui/lib/ -javaagent:/opt/ibm/wlp/bin/tools/ws-javaagent.jar -jar /opt/ibm/wlp/bin/tools/ws-server.jar gpfsgui --clean
ExecStopPost=/usr/lpp/mmfs/gui/bin-sudo/cleanupiptables
NotifyAccess=main
SuccessExitStatus=0 137 143
TimeoutStartSec=300
TimeoutStopSec=180
WatchdogSec=60
Restart=on-failure
CPUQuota=200%
MemoryLimit=2G

User=scalemgmt
PermissionsStartOnly=true

[Install]
WantedBy=multi-user.target
mew2057 commented 4 years ago
Mar 04 11:18:08 jdun-ub-1 systemd[1]: gpfsgui.service: Watchdog timeout (limit 1min)!
Mar 04 11:18:08 jdun-ub-1 java[23640]: JVMDUMP039I Processing dump event "abort", detail "" at 2020/03/04 11:18:08 - please w
Mar 04 11:18:08 jdun-ub-1 java[23640]: JVMDUMP032I JVM requested System dump using '/var/crash/scalemgmt/core.20200304.111808
Mar 04 11:18:08 jdun-ub-1 java[23640]: IBM Java[23640]: JVMDUMP039I Processing dump event "abort", detail "" at 2020/03/04 11
Mar 04 11:18:08 jdun-ub-1 java[23640]: IBM Java[23640]: JVMDUMP032I JVM requested System dump using '/var/crash/scalemgmt/cor
Mar 04 11:18:09 jdun-ub-1 java[23640]: JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c
Mar 04 11:18:09 jdun-ub-1 java[23640]: JVMDUMP010I System dump written to /var/crash/scalemgmt/core.20200304.111808.23640.000
Mar 04 11:18:09 jdun-ub-1 java[23640]: JVMDUMP032I JVM requested Java dump using '/var/crash/scalemgmt/javacore.20200304.1118
Mar 04 11:18:09 jdun-ub-1 java[23640]: IBM Java[23640]: JVMDUMP032I JVM requested Java dump using '/var/crash/scalemgmt/javac
Mar 04 11:18:10 jdun-ub-1 java[23640]: JVMDUMP010I Java dump written to /var/crash/scalemgmt/javacore.20200304.111808.23640.0
Mar 04 11:18:10 jdun-ub-1 java[23640]: JVMDUMP032I JVM requested Snap dump using '/var/crash/scalemgmt/Snap.20200304.111808.2
Mar 04 11:18:10 jdun-ub-1 java[23640]: IBM Java[23640]: JVMDUMP032I JVM requested Snap dump using '/var/crash/scalemgmt/Snap.
Mar 04 11:18:10 jdun-ub-1 java[23640]: JVMDUMP010I Snap dump written to /var/crash/scalemgmt/Snap.20200304.111808.23640.0003.
Mar 04 11:18:10 jdun-ub-1 java[23640]: JVMDUMP007I JVM Requesting JIT dump using '/var/crash/scalemgmt/jitdump.20200304.11180
Mar 04 11:18:10 jdun-ub-1 java[23640]: JVMDUMP010I JIT dump written to /var/crash/scalemgmt/jitdump.20200304.111808.23640.000
Mar 04 11:18:10 jdun-ub-1 java[23640]: JVMDUMP013I Processed dump event "abort", detail "".
Mar 04 11:18:10 jdun-ub-1 systemd[1]: gpfsgui.service: Main process exited, code=exited, status=1/FAILURE
Mar 04 11:18:10 jdun-ub-1 cleanupiptables[25495]: Found httpPort defined in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml : 470
Mar 04 11:18:10 jdun-ub-1 cleanupiptables[25495]: Found httpsPort defined in /opt/ibm/wlp/usr/servers/gpfsgui/server.xml : 47
Mar 04 11:18:10 jdun-ub-1 cleanupiptables[25495]: Update of IP tables disabled.
Mar 04 11:18:10 jdun-ub-1 systemd[1]: gpfsgui.service: Unit entered failed state.
Mar 04 11:18:10 jdun-ub-1 systemd[1]: gpfsgui.service: Failed with result 'exit-code'.
Mar 04 11:18:10 jdun-ub-1 systemd[1]: gpfsgui.service: Service hold-off time over, scheduling restart.
Mar 04 11:18:10 jdun-ub-1 systemd[1]: Stopped IBM_Spectrum_Scale Administration GUI.
-- Subject: Unit gpfsgui.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit gpfsgui.service has finished shutting down.
mew2057 commented 4 years ago

I think this is the source of the failure:

scalemgmt : command not allowed ; TTY=unknown ; PWD=/opt/ibm/wlp ; USER=root ; COMMAND=mmvdisk recoverygroup list -Y
mew2057 commented 4 years ago

I cheated a little and added a dummy /usr/lpp/mmfs/bin/mmvdisk which is just an empty executable and the log expanded.

Mar 05 06:39:10 jdun-ub-1 sudo[14318]: scalemgmt : TTY=unknown ; PWD=/opt/ibm/wlp ; USER=root ; COMMAND=/usr/lpp/mmfs/bin/mmvdisk recoverygroup list -Y
Mar 05 06:39:10 jdun-ub-1 sudo[14318]: pam_unix(sudo:session): session opened for user root by (uid=0)
Mar 05 06:39:10 jdun-ub-1 sudo[14318]: pam_unix(sudo:session): session closed for user root
Mar 05 06:39:10 jdun-ub-1 java[13388]: Detected Platform is UNKNOWN because no recevery groups are available.
Mar 05 06:39:10 jdun-ub-1 java[13388]: Failed to detected platform. Try to get platform from CCR instead (in case it was already detected earlier).
Mar 05 06:39:10 jdun-ub-1 java[13388]: Successfully got platform from CCR: unknown
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task DISK_USAGE to run every 1 day at 030000
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task DISKS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task FILESETS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task FILESYSTEM_MOUNT to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task FILESYSTEMS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task FILE_AUDIT_LOG_CONFIG to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task GUI_CONFIG_CHECK to run every 720 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task GPFS_JOBS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task DIGEST_NOTIFICATION_TASK to run every 1 day at 041500
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task HEALTH_STATES to run every 50 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register not scheduled task HEALTH_TRIGGERED
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task HOST_STATES to run every 15 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task HOST_STATES_CLIENTS to run every 180 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register not scheduled task INODES
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task KEYSTORE to run every 720 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task LOG_REMOVER to run every 360 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task MASTER_GUI_ELECTION to run every 1 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task MOUNT_CONFIG to run every 720 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task POLICIES to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task QUOTA to run every 1 day at 021500
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task QUOTA_DEFAULTS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task QUOTA_ID_RESOLVE to run every 10080 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task QUOTA_MAIL to run every 1 day at 050000
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task RDMA_INTERFACES to run every 720 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task REMOTE_CONFIG to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task REMOTE_CLUSTER to run every 10 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task REMOTE_FILESETS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task REMOTE_GPFS_CONFIG to run every 180 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task REMOTE_HEALTH_STATES to run every 15 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task SMB_GLOBALS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task SMB_SHARES to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task SNAPSHOTS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register not scheduled task SNAPSHOTS_FS_USAGE
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task SNAPSHOT_MANAGER to run every 1 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task SQL_STATISTICS to run every 1 day at 000100
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register not scheduled task STATE_MAIL
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task STORAGE_POOL to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task SYSTEMUTIL_DF to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task TCT_ACCOUNT to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task TCT_CLOUD_SERVICE to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task AFM_FILESET_STATE to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task AFM_NODE_MAPPING to run every 720 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task ALTER_HOST_NAME to run every 720 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CALLBACK to run every 360 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CALLHOME to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CALLHOME_STATUS to run every 1440 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CAPACITY_LICENSE to run every 360 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task TCT_NODECLASS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task THRESHOLDS to run every 180 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register not scheduled task TASK_CHAIN
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CES_ADDRESS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CES_STATE to run every 10 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CES_SERVICE_STATE to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CES_USER_AUTH_SERVICE to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CLUSTER_CONFIG to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task CONNECTION_STATUS to run every 10 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task DAEMON_CONFIGURATION to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task DF to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task NFS_EXPORTS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task NFS_EXPORTS_DEFAULTS to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task NFS_SERVICE to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task NODE_LICENSE to run every 360 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task NODECLASS to run every 360 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task OBJECT_STORAGE_POLICY to run every 360 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task OS_DETECT to run every 360 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task PM_MONITOR to run every 10 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task PM_SENSORS to run every 360 minutes
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) schedulePeriodicTasks: register and schedule task PM_TOPOLOGY to run every 60 minutes
Mar 05 06:39:10 jdun-ub-1 sudo[14326]: scalemgmt : TTY=unknown ; PWD=/opt/ibm/wlp ; USER=root ; COMMAND=/usr/lpp/mmfs/bin/mmccr vget gui_hide_events
Mar 05 06:39:10 jdun-ub-1 sudo[14326]: pam_unix(sudo:session): session opened for user root by (uid=0)
Mar 05 06:39:10 jdun-ub-1 sudo[14326]: pam_unix(sudo:session): session closed for user root
Mar 05 06:39:10 jdun-ub-1 java[13388]: (Startup) 141ms Background tasks started.
Mar 05 06:39:10 jdun-ub-1 java[13388]: Systems Management JVM environment runtime:
Mar 05 06:39:10 jdun-ub-1 java[13388]:  Free memory in the JVM: 56MB
Mar 05 06:39:10 jdun-ub-1 java[13388]:  Total memory in the JVM: 111MB
Mar 05 06:39:10 jdun-ub-1 java[13388]:  Available memory in the JVM: 1481MB
Mar 05 06:39:10 jdun-ub-1 java[13388]:  Max memory that the JVM will attempt to use: 1536MB
Mar 05 06:39:10 jdun-ub-1 java[13388]:  Number of processors available to JVM: 2
Mar 05 06:39:11 jdun-ub-1 kubelet[14349]: I0305 06:39:11.302681   14349 setters.go:73] Using node IP: "172.16.194.157"
Mar 05 06:39:21 jdun-ub-1 kubelet[14349]: I0305 06:39:21.316562   14349 setters.go:73] Using node IP: "172.16.194.157"
Mar 05 06:39:31 jdun-ub-1 kubelet[14349]: I0305 06:39:31.327338   14349 setters.go:73] Using node IP: "172.16.194.157"
Mar 05 06:39:41 jdun-ub-1 kubelet[14349]: I0305 06:39:41.337162   14349 setters.go:73] Using node IP: "172.16.194.157"
Mar 05 06:39:46 jdun-ub-1 kubelet[14349]: I0305 06:39:46.549661   14349 kubelet_network_linux.go:111] Not using `--random-fully` in the MASQUERADE rule for iptables because the local version of iptables does not support it
Mar 05 06:39:51 jdun-ub-1 kubelet[14349]: I0305 06:39:51.352691   14349 setters.go:73] Using node IP: "172.16.194.157"
Mar 05 06:40:01 jdun-ub-1 kubelet[14349]: I0305 06:40:01.366733   14349 setters.go:73] Using node IP: "172.16.194.157"
Mar 05 06:40:01 jdun-ub-1 CRON[15220]: pam_unix(cron:session): session opened for user root by (uid=0)
Mar 05 06:40:01 jdun-ub-1 CRON[15221]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 05 06:40:01 jdun-ub-1 CRON[15220]: pam_unix(cron:session): session closed for user root
Mar 05 06:40:03 jdun-ub-1 systemd[1]: gpfsgui.service: Watchdog timeout (limit 1min)!
mew2057 commented 4 years ago

Just realized I was misreading 1 min as 1 second.

mew2057 commented 4 years ago

This is the resolution for this issue:

The main issue is that there is no port forwarding rule for the GUI (47443 to 443) and UPDATE_IPTABLES is disabled in /etc/sysconfig/gpfsgui This causes the GUI not to be able to access itself on port 443, so it doesn't ping systemd, which kills it after 60 seconds I have set WatchdogSec to 0 in /usr/lib/systemd/system/gpfsgui.service, so the GUI will keep running now

mew2057 commented 4 years ago

Using new custom resource:

root@jdun-ub-1:~# cat ibm-spectrum-scale-csi-operator-cr.yaml
apiVersion: csi.ibm.com/v1
kind: 'CSIScaleOperator'
metadata:
    name: 'ibm-spectrum-scale-csi'
    namespace: 'ibm-spectrum-scale-csi-driver'
    labels:
      app.kubernetes.io/name: ibm-spectrum-scale-csi-operator
      app.kubernetes.io/instance: ibm-spectrum-scale-csi-operator
      app.kubernetes.io/managed-by: ibm-spectrum-scale-csi-operator
      release: ibm-spectrum-scale-csi-operator
status: {}
spec:
# The path to the GPFS file system mounted on the host machine.
# ==================================================================================
  scaleHostpath: "/ibm/fs1"
  attacher: 'quay.io/k8scsi/csi-attacher:v1.0.0'
  provisioner: 'quay.io/k8scsi/csi-provisioner:v1.5.0'
  driverRegistrar: "quay.io/k8scsi/csi-node-driver-registrar:v1.0.1"
  #spectrumScale: "quay.io/ibm-spectrum-scale/ibm-spectrum-scale-csi-driver:v1.0.0"
  nodeMapping:
    - k8sNode: "jdun-ub-1"
      spectrumscaleNode: "jdun-ub-1.fyre.ibm.com"
    - k8sNode: "jdun-ub-2"
      spectrumscaleNode: "jdun-ub-2.fyre.ibm.com"
    - k8sNode: "jdun-ub-3"
      spectrumscaleNode: "jdun-ub-3.fyre.ibm.com"
  clusters:
    - id: "16807700128967085992"
      secrets: "csisecret"
      secureSslMode: false
      primary:
        primaryFs: "fs1"
        primaryFset: "csiFset1"
      restApi:
        - guiHost: "jdun-ub-1.fyre.ibm.com"
          guiPort: 47443
mew2057 commented 4 years ago

Everything comes up now:

root@jdun-ub-1:~# kubectl get pods -n ibm-spectrum-scale-csi-driver
NAME                                               READY   STATUS    RESTARTS   AGE
ibm-spectrum-scale-csi-4v4zx                       2/2     Running   0          50s
ibm-spectrum-scale-csi-attacher-0                  1/1     Running   7          25m
ibm-spectrum-scale-csi-f8qrb                       2/2     Running   0          51s
ibm-spectrum-scale-csi-operator-5f5f549754-n4flh   2/2     Running   0          2d16h
ibm-spectrum-scale-csi-provisioner-0               1/1     Running   0          25m
mew2057 commented 4 years ago

Setting up the Storage Class and Persistent Volume Claim:

root@jdun-ub-1:~# cat storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ibm-spectrum-scale-csi-lt
provisioner: spectrumscale.csi.ibm.com
parameters:
    volBackendFs: "fs1"
    clusterId: "16807700128967085992"
    uid: "1000"
    gid: "1000"
root@jdun-ub-1:~# cat pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ibm-spectrum-scale-csi-lt
root@jdun-ub-1:~# kubectl apply -f storage-class.yaml
storageclass.storage.k8s.io/ibm-spectrum-scale-csi-lt created
root@jdun-ub-1:~# kubectl apply  -f  pvc.yaml
persistentvolumeclaim/my-pvc created
mew2057 commented 4 years ago

Create the test pod:

root@jdun-ub-1:~# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: csi-scale-fsetdemo-pod
  labels:
    app: nginx
spec:
  containers:
   - name: web-server
     image: nginx
     volumeMounts:
       - name: mypvc
         mountPath: /usr/share/nginx/html/scale
     ports:
     - containerPort: 80
  volumes:
   - name: mypvc
     persistentVolumeClaim:
       claimName: my-pvc
       readOnly: false
root@jdun-ub-1:~# kubectl apply  -f  pod.yaml
pod/csi-scale-fsetdemo-pod created
mew2057 commented 4 years ago
root@jdun-ub-1:~# kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
csi-scale-fsetdemo-pod   1/1     Running   0          92s

root@jdun-ub-1:~# kubectl exec csi-scale-fsetdemo-pod  -it bash
root@csi-scale-fsetdemo-pod:/# cd  /usr/share/nginx/html/scale
root@csi-scale-fsetdemo-pod:/usr/share/nginx/html/scale# ls
root@csi-scale-fsetdemo-pod:/usr/share/nginx/html/scale# echo "testing123"  > test
root@csi-scale-fsetdemo-pod:/usr/share/nginx/html/scale# exit
exit
root@jdun-ub-1:~# cat /ibm/fs1/pvc-a2cb5144-5b2d-431f-acd8-950a598df756/pvc-a2cb5144-5b2d-431f-acd8-950a598df756-data/test
testing123

Looks like things are working now.