ssirag commented 5 years ago

I'm trying to install Gluster daemon hyperconverged on a completely fresh Kubernetes cluster on 3 Ubuntu 18.04 nodes, which are also the only nodes in the Kubernetes cluster (all masters).

I've done the following:

Configured (but not initialized) the /dev/sdb drive on each host for exclusive Gluster use.
Installed Gluster (client only) using apt install glusterfs-client (version 4.1.6)
Made sure the module and firewall requirements listed here are loaded: https://github.com/gluster/gluster-kubernetes/blob/master/docs/setup-guide.md
Created topology file
Run ./gk-deploy -gv

The end result (see verbose output in attachment) Gk-deploy verbose output.txt is 3 glusterfs pods showing this in their kubectl describe output:

`Events: Type Reason Age From Message

Normal Pulled 19m kubelet, vva-er-k8s0 Container image "gluster/gluster-centos:latest" already present on machine Normal Created 19m kubelet, vva-er-k8s0 Created container Normal Started 19m kubelet, vva-er-k8s0 Started container Warning Unhealthy 14m (x11 over 18m) kubelet, vva-er-k8s0 Readiness probe failed: /usr/local/bin/status-probe.sh failed check: systemctl -q is-active gluster-blockd.service Warning Unhealthy 4m10s (x35 over 18m) kubelet, vva-er-k8s0 Liveness probe failed: /usr/local/bin/status-probe.sh failed check: systemctl -q is-active gluster-blockd.service`

If I enter one of the containers, I can see the following:

input: systemctl -q

output:
glusterd.service loaded active running GlusterFS, a clustered file-system server

input: cat /var/log/glusterfs/glusterd.log

output: [root@vva-er-k8s0 glusterfs]# cat glusterd.log [2018-11-26 20:49:11.437520] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 4.1.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2018-11-26 20:49:11.441727] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2018-11-26 20:49:11.441752] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2018-11-26 20:49:11.441759] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory [2018-11-26 20:49:11.446193] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2018-11-26 20:49:11.446212] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device [2018-11-26 20:49:11.446220] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2018-11-26 20:49:11.446289] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2018-11-26 20:49:11.446299] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2018-11-26 20:49:12.457991] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2018-11-26 20:49:12.458038] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2018-11-26 20:49:12.458040] I [MSGID: 106514] [glusterd-store.c:2262:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 40100 [2018-11-26 20:49:12.468372] I [MSGID: 106194] [glusterd-store.c:3849:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 10 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [2018-11-26 20:49:12.468685] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2018-11-26 21:10:59.076690] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 4.1.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2018-11-26 21:10:59.080303] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2018-11-26 21:10:59.080327] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2018-11-26 21:10:59.080335] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory [2018-11-26 21:10:59.084469] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2018-11-26 21:10:59.084488] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device [2018-11-26 21:10:59.084496] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2018-11-26 21:10:59.084565] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2018-11-26 21:10:59.084575] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2018-11-26 21:11:00.027597] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2018-11-26 21:11:00.027644] E [MSGID: 101032] [store.c:441:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2018-11-26 21:11:00.027646] I [MSGID: 106514] [glusterd-store.c:2262:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 40100 [2018-11-26 21:11:00.034982] I [MSGID: 106194] [glusterd-store.c:3849:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 10 8: option event-threads 1 9: option ping-timeout 0 10: option transport.socket.read-fail-log off 11: option transport.socket.keepalive-interval 2 12: option transport.socket.keepalive-time 10 13: option transport-type rdma 14: option working-directory /var/lib/glusterd 15: end-volume 16: +------------------------------------------------------------------------------+ [2018-11-26 21:11:00.035295] I [MSGID: 101190] [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1

Every few minutes, docker kills and recreates the gluster container.

What am I missing here?

nixpanic commented 5 years ago

The systemctl error suggests gluster-blockd fails to start. It is not clear from the logs which you posted what the problem could be. You should check /var/log/glusterfs/gluster-block on the node where the pod is failing, hopefully those logs have an explanation.

If you are not planning to use gluster-block backed volumes, you can also edit the deploy/kube-templates/glusterfs-daemonset.yaml file and set GLUSTER_BLOCKD_STATUS_PROBE_ENABLE to "0". But that would be a workaround and not a real fix. If we can find the cause of the problem, that would be much better.

cc: @pkalever

ssirag commented 5 years ago

@nixpanic the only file in the /var/log/glusterfs/gluster-block directory (k8s_glusterfs_glusterfs-6mwkg container) is named tcmu-runner.log, and the only line in that log is this:

2018-11-27 05:50:59.011 59 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS

Anything else I should be looking at?

jiayiwang7 commented 5 years ago

@nixpanic the only file in the /var/log/glusterfs/gluster-block directory (k8s_glusterfs_glusterfs-6mwkg container) is named tcmu-runner.log, and the only line in that log is this:

2018-11-27 05:50:59.011 59 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS

Anything else I should be looking at?

+1

ssirag commented 5 years ago

@nixpanic This seems from the logs to be the critical point of failure...

[2018-11-28 20:00:37.522034] I [MSGID: 100030] [glusterfsd.c:2741:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 4.1.5 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2018-11-28 20:00:37.526980] I [MSGID: 106478] [glusterd.c:1423:init] 0-management: Maximum allowed open file descriptors set to 65536 [2018-11-28 20:00:37.527007] I [MSGID: 106479] [glusterd.c:1481:init] 0-management: Using /var/lib/glusterd as working directory [2018-11-28 20:00:37.527014] I [MSGID: 106479] [glusterd.c:1486:init] 0-management: Using /var/run/gluster as pid file working directory

[2018-11-28 20:00:37.532662] W [MSGID: 103071] [rdma.c:4629:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device] [2018-11-28 20:00:37.532682] W [MSGID: 103055] [rdma.c:4938:init] 0-rdma.management: Failed to initialize IB Device [2018-11-28 20:00:37.532690] W [rpc-transport.c:351:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2018-11-28 20:00:37.532764] W [rpcsvc.c:1781:rpcsvc_create_listener] 0-rpc-service: cannot create listener, initing the transport failed [2018-11-28 20:00:37.532775] E [MSGID: 106244] [glusterd.c:1764:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport

ssirag commented 5 years ago

Looks like it may be the readiness check itself... quoting from the glusterfs-daemonset.yaml file:

exec: command:

"/bin/bash"

"-c"

"if command -v /usr/local/bin/status-probe.sh; then /usr/local/bin/status-probe.sh readiness; else systemctl status glusterd.service; fi"

I tested both commands in the Gluster container, with these results:

[root@vva-er-k8s1 gluster-block]# systemctl status glusterd.service ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2018-11-28 20:00:38 UTC; 11min ago Process: 75 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 76 (glusterd) CGroup: /kubepods/burstable/pod458cea8b-f348-11e8-ae94-00155d14c973/1fd0ee280b300c0240334127b05d9e419af27866f025fd91263590e0bc47cc1c/system.slice/glusterd.service └─76 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Nov 28 20:00:37 vva-er-k8s1 systemd[1]: Starting GlusterFS, a clustered file-system server... Nov 28 20:00:38 vva-er-k8s1 systemd[1]: Started GlusterFS, a clustered file-system server. [root@vva-er-k8s1 gluster-block]# /usr/local/bin/status-probe.sh warning: no mode provided. Assuming liveness probe failed check: systemctl -q is-active gluster-blockd.service [root@vva-er-k8s1 gluster-block]# systemctl -q is-active gluster-blockd.service [root@vva-er-k8s1 gluster-block]#

"systemctl -q is-active gluster-blockd.service" fails to show anything.

If I run "systemctl is-active gluster-blockd.service" it shows "inactive".

If I run "systemctl -l | grep gluster" I don't even see the gluster-blockd.service:

etc-glusterfs.mount loaded active mounted /etc/glusterfs var-lib-glusterd.mount loaded active mounted /var/lib/glusterd var-lib-misc-glusterfsd.mount loaded active mounted /var/lib/misc/glusterfsd var-log-glusterfs.mount loaded active mounted /var/log/glusterfs gluster-check-diskspace.service loaded active running Check glusterd config directory full glusterd.service loaded active running GlusterFS, a clustered file-system server

ssirag commented 5 years ago

Sorry for multiple posts... I disabled the health check, cleaned and redeployed, and am now not having an issue with the gluster deployment and operation (confirmed data is moving from host to host), except for the disabled health check.

nixpanic commented 5 years ago

When in the container, can you try to start the gluster-blockd.service? It would be most interesting to see how that works out. The following commands should help you with that:

# systemctl status gluster-blockd.service
# systemctl start gluster-blockd.service
# systemctl status gluster-blockd.service

Without error messages why the service does not start (or maybe it terminates quickly?), it will be difficult to guess what the problem is.

ssirag commented 5 years ago

Here's what I get on status:

[root@vva-er-k8s0 /]# systemctl status gluster-blockd.service ● gluster-blockd.service - Gluster block storage utility Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; enabled; vendor preset: disabled) Active: inactive (dead)

Nov 28 21:03:40 vva-er-k8s0 systemd[1]: Dependency failed for Gluster block storage utility. Nov 28 21:03:40 vva-er-k8s0 systemd[1]: Job gluster-blockd.service/start failed with result 'dependency'.

What I don't understand is this: I have a fully functional Gluster-heketi system now working on the cluster, without that service, and with no health check; but i can enter any of the Gluster pods, check out the brick and find my test data replicated. I walked through this "Hello World" tutorial without issue:

https://github.com/gluster/gluster-kubernetes/blob/master/docs/examples/hello_world/README.md

What's really the purpose of gluster-blockd.service? It seems irrelevant.

phlogistonjohn commented 5 years ago

If you're not using block volumes (gluster-block) then you can get away with not running the service. If you tried to use any of the heketi support for block volumes (heketi-cli blockvolume create..., etc) these commands would fail. Gluster block allows you to create iscsi block volumes automatically within gluster volumes. This is useful for certain classes of applications that may have issues with the normal gluster semantics or performance characteristics.

If you never ever want to use block volumes from gluster/heketi you can probably safely ignore this problem.

However, if you wish to debug further I suggest dumping all of the systemd service statuses from within the pod and looking at those. I'm guessing something blockd depends on failed.

ssirag commented 5 years ago

I can see a number of disabled services (listed below), but am having issues tracing the dependencies. Each status only shows that a dependency failed, doesn't say which one.

tmp.mount disabled brandbot.path disabled autovt@.service disabled blk-availability.service disabled console-getty.service disabled console-shell.service disabled debug-shell.service disabled getty@.service disabled gluster-block-target.service disabled glusterfsd.service disabled glusterfssharedstorage.service disabled gssproxy.service disabled nfs-blkmap.service disabled nfs-rquotad.service disabled nfs-server.service disabled nfs.service disabled rdisc.service disabled rdma.service disabled rhel-autorelabel-mark.service disabled rpc-rquotad.service disabled rsyncd.service disabled serial-getty@.service disabled systemd-bootchart.service disabled systemd-nspawn@.service disabled systemd-readahead-collect.service disabled systemd-readahead-drop.service disabled systemd-readahead-replay.service disabled target.service disabled tcmu-runner.service disabled rsyncd.socket disabled sshd.socket disabled ctrl-alt-del.target disabled halt.target disabled kexec.target disabled machines.target disabled poweroff.target disabled reboot.target disabled remote-cryptsetup.target disabled remote-fs.target disabled rescue.target disabled runlevel0.target disabled runlevel1.target disabled runlevel6.target disabled fstrim.timer disabled

ssirag commented 5 years ago

This one might be the culprit? Searching on "Failed to start LIO Userspace-passthrough daemon" points to bugfix threads.

systemctl status tcmu-runner.service

● tcmu-runner.service - LIO Userspace-passthrough daemon Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2018-12-03 17:22:14 UTC; 8s ago Process: 5151 ExecStart=/usr/bin/tcmu-runner (code=exited, status=1/FAILURE) Main PID: 5151 (code=exited, status=1/FAILURE)

Dec 03 17:22:14 vva-er-k8s0 systemd[1]: Starting LIO Userspace-passthrough daemon... Dec 03 17:22:14 vva-er-k8s0 tcmu-runner[5151]: The logdir option from the tcmu.conf will be ignored Dec 03 17:22:14 vva-er-k8s0 tcmu-runner[5151]: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS Dec 03 17:22:14 vva-er-k8s0 systemd[1]: tcmu-runner.service: main process exited, code=exited, status=1/FAILURE Dec 03 17:22:14 vva-er-k8s0 systemd[1]: Failed to start LIO Userspace-passthrough daemon. Dec 03 17:22:14 vva-er-k8s0 systemd[1]: Unit tcmu-runner.service entered failed state. Dec 03 17:22:14 vva-er-k8s0 systemd[1]: tcmu-runner.service failed.

nixpanic commented 5 years ago

Indeed, gluster-block depends on tcmu-runner. I expect that if tcmu-runner fails, gluster-blockd also goes into a failed state. /var/log/glusterfs/gluster-block/tcmu-runner.log might contain details why it failed.

ssirag commented 5 years ago

The only thing in that log file is this:

2018-12-03 17:19:49.281 4955 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS

maurya-m commented 5 years ago

Having the same issue as @ssirag; - did anyone else find the solution for the systemctl start gluster-blockd.service failure?

renich commented 5 years ago

I got this from inside the of one of the glusterfs pods:

[root@glusterfs2 /]# tcmu-runner -dd
The logdir option from the tcmu.conf will be ignored
2019-01-16 03:16:55.012 288 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-01-16 03:16:55.012 288 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner
2019-01-16 03:16:55.013 288 [INFO] load_our_module:498: no modules directory '/lib/modules/4.15.0-43-generic', checking module target_core_user entry in '/sys/modules/'
2019-01-16 03:16:55.013 288 [ERROR] load_our_module:503: stat() on '/sys/module/target_core_user' failed: No such file or directory
2019-01-16 03:16:55.013 288 [ERROR] main:1022: couldn't load module

Actually, there is nothing in the /lib/modules directory.

renich commented 5 years ago

Here's another super-odd thing. If I run the command like crazy (several times in a row) I get different messages:

[root@glusterfs2 /]# tcmu-runner -d
The logdir option from the tcmu.conf will be ignored
2019-01-16 03:19:57.426 536 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner
2019-01-16 03:19:57.426 536 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-01-16 03:19:57.426 536 [INFO] load_our_module:498: no modules directory '/lib/modules/4.15.0-43-generic', checking module target_core_user entry in '/sys/modules/'
2019-01-16 03:19:57.426 536 [ERROR] load_our_module:503: stat() on '/sys/module/target_core_user' failed: No such file or directory
2019-01-16 03:19:57.426 536 [ERROR] main:1022: couldn't load module
[root@glusterfs2 /]# tcmu-runner -d
The logdir option from the tcmu.conf will be ignored
2019-01-16 03:19:57.667 539 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-01-16 03:19:57.667 539 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner
2019-01-16 03:19:57.667 539 [INFO] load_our_module:498: no modules directory '/lib/modules/4.15.0-43-generic', checking module target_core_user entry in '/sys/modules/'
[root@glusterfs2 /]# tcmu-runner -d
The logdir option from the tcmu.conf will be ignored
2019-01-16 03:19:57.906 542 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner
2019-01-16 03:19:57.906 542 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
[root@glusterfs2 /]# tcmu-runner -d
The logdir option from the tcmu.conf will be ignored
2019-01-16 03:19:58.161 545 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner
2019-01-16 03:19:58.161 545 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-01-16 03:19:58.162 545 [INFO] load_our_module:498: no modules directory '/lib/modules/4.15.0-43-generic', checking module target_core_user entry in '/sys/modules/'
2019-01-16 03:19:58.162 545 [ERROR] load_our_module:503: stat() on '/sys/module/target_core_user' failed: No such file or directory
2019-01-16 03:19:58.162 545 [ERROR] main:1022: couldn't load module
[root@glusterfs2 /]# tcmu-runner -d
The logdir option from the tcmu.conf will be ignored
Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-01-16 03:19:58.417 548 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner
[root@glusterfs2 /]# tcmu-runner -d
The logdir option from the tcmu.conf will be ignored
Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-01-16 03:19:58.702 551 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner
[root@glusterfs2 /]# tcmu-runner -d
The logdir option from the tcmu.conf will be ignored
[root@glusterfs2 /]# tcmu-runner -d
The logdir option from the tcmu.conf will be ignored
2019-01-16 03:19:59.179 557 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-01-16 03:19:59.180 557 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner

nixpanic commented 5 years ago

On Tue, Jan 15, 2019 at 07:17:31PM -0800, Renich Bon Ciric wrote:

I got this from inside the of one of the glusterfs pods:

[root@glusterfs2 /]# tcmu-runner -dd
The logdir option from the tcmu.conf will be ignored
2019-01-16 03:16:55.012 288 [INFO] dyn_config_start:409: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-01-16 03:16:55.012 288 [DEBUG] main:1018: handler path: /usr/lib64/tcmu-runner
2019-01-16 03:16:55.013 288 [INFO] load_our_module:498: no modules directory '/lib/modules/4.15.0-43-generic', checking module target_core_user entry in '/sys/modules/'
2019-01-16 03:16:55.013 288 [ERROR] load_our_module:503: stat() on '/sys/module/target_core_user' failed: No such file or directory
2019-01-16 03:16:55.013 288 [ERROR] main:1022: couldn't load module

Is /lib not a symlink to /usr/lib on Ubuntu (whatever version)? The kernel modules that needs to get loaded (target_core_user and others) need to be available in the container image. We do this by bind-mounting /usr/lib/modules from the host to /usr/lib/modules in the container. But if /lib/modules is needed on Ubuntu, we could consider changing it in the templates.

Can you modify the entry for 'volumes:' and 'volumeMounts:' to use "/lib/modules" in the glusterfs-server pods?

renich commented 5 years ago

@nixpanic no; it seems it isn't.

/usr/lib/modules doesn't exist. I will try to modify the entry and reinstall.

renich commented 5 years ago

And it worked:

diff --git a/deploy/kube-templates/glusterfs-daemonset.yaml b/deploy/kube-templates/glusterfs-daemonset.yaml
index ea07421..c37a5f4 100644
--- a/deploy/kube-templates/glusterfs-daemonset.yaml
+++ b/deploy/kube-templates/glusterfs-daemonset.yaml
@@ -67,7 +67,7 @@ spec:
           mountPath: "/etc/ssl"
           readOnly: true
         - name: kernel-modules
-          mountPath: "/usr/lib/modules"
+          mountPath: "/lib/modules"
           readOnly: true
         securityContext:
           capabilities: {}
@@ -131,4 +131,4 @@ spec:
           path: "/etc/ssl"
       - name: kernel-modules
         hostPath:
-          path: "/usr/lib/modules"
+          path: "/lib/modules"

nixpanic commented 5 years ago

Do you mind sending this as a PR? Maybe also change the openshift template the same way.

In your commit message, please explain that your version of Ubuntu does not have the /lib -> /usr/lib symlink and that loading kernel modules requires the (/usr)/lib/modules directory in the container.

Thanks!

renich commented 5 years ago

Sure thing. I'll send it tomorrow since I I'm about to go to sleep. I'll do this first thing in the morning.

pupapaik commented 5 years ago

I hit same issue on Azure AKS. Can you please merge that PR.

cyrilbkr commented 5 years ago

Same issue here with Debian Stretch. Please merge

mrmarcsmith commented 5 years ago

Same! this is a blocker for us.

Sheldor commented 5 years ago

Same issue here with Centos 7. But tcmu log only shows Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS No error messages to debug.

mart3051 commented 4 years ago

Same issue here on Amazon EKS

gluster / gluster-kubernetes

Gk-deploy failing on Ubuntu #539

systemctl status tcmu-runner.service