Closed pkalever closed 5 years ago
@lxbsz please help review, note the TODO items above, will add them soon. Thanks!
@pkalever
There has some build warnings:
block_version.c: In function 'glusterBlockBuildMinCaps':
block_version.c:30:3: warning: enumeration value 'RELOAD_SRV' not handled in switch [-Wswitch]
30 | switch (opt) {
| ^~~~~~
CC libgbrpc_la-block_modify.lo
CC libgbrpc_la-block_genconfig.lo
block_genconfig.c: In function 'block_gen_config_cli_1_svc':
block_genconfig.c:66:43: warning: '%s' directive output may be truncated writing up to 255 bytes into a region of size 239 [-Wformat-truncation=]
66 | snprintf(lun_so, 256, "/backstores/user/%s", block);
| ^~
In file included from /usr/include/stdio.h:867,
from ../utils/utils.h:16,
from ../utils/common.h:15,
from block_common.h:16,
from block_genconfig.c:12:
/usr/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk' output between 18 and 273 bytes into a destination of size 256
67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CC libgbrpc_la-block_reload.lo
Thanks.
I just backup and delete the /etc/target/saveconfig.json, restart the node.163:
# targetcli ls
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 0]
| o- fileio ................................................................................................. [Storage Objects: 0]
| o- pscsi .................................................................................................. [Storage Objects: 0]
| o- ramdisk ................................................................................................ [Storage Objects: 0]
| o- user:fbo ............................................................................................... [Storage Objects: 0]
| o- user:glfs .............................................................................................. [Storage Objects: 0]
| o- user:qcow .............................................................................................. [Storage Objects: 0]
| o- user:zbc ............................................................................................... [Storage Objects: 0]
o- iscsi ............................................................................................................ [Targets: 0]
o- loopback ......................................................................................................... [Targets: 0]
o- vhost ............................................................................................................ [Targets: 0]
o- xen-pvscsi ....................................................................................................... [Targets: 0]
Then in another node.164 to run the reload, but it failed:
# gluster-block reload repvol/block0
FAILED ON: 192.168.195.163
SUCCESSFUL ON: 192.168.195.162 192.168.195.164
RESULT: FAIL
The logs in node.163 are:
[2019-09-16 00:58:53.264341] INFO: reload request, blockname=block0 filename=5bf22bfa-965a-48c3-8a43-f0d5018e9303 [at block_reload.c+366 :<block_reload_1_svc_st>]
[2019-09-16 00:58:53.683346] ERROR: Block 'block0' may be not loaded. [at block_svc_routines.c+111 :<blockCheckBlockLoadedStatus>]
[2019-09-16 00:58:53.686660] ERROR: Block 'block0' not loaded. [at block_svc_routines.c+162 :<blockCheckBlockLoadedStatus>]
And after I just copied the backuped saveconfig.json back to /etc/target/, then restart the gluster-blockd and tcmu-runner services, it can load them back successfully on node.163:
# targetcli ls
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 0]
| o- fileio ................................................................................................. [Storage Objects: 0]
| o- pscsi .................................................................................................. [Storage Objects: 0]
| o- ramdisk ................................................................................................ [Storage Objects: 0]
| o- user:fbo ............................................................................................... [Storage Objects: 0]
| o- user:glfs .............................................................................................. [Storage Objects: 0]
| o- user:qcow .............................................................................................. [Storage Objects: 0]
| o- user:zbc ............................................................................................... [Storage Objects: 0]
o- iscsi ............................................................................................................ [Targets: 0]
o- loopback ......................................................................................................... [Targets: 0]
o- vhost ............................................................................................................ [Targets: 0]
o- xen-pvscsi ....................................................................................................... [Targets: 0]
# systemctl daemon-reload; systemctl restart tcmu-runner gluster-blockd; systemctl status tcmu-runner gluster-blockd
● tcmu-runner.service - LIO Userspace-passthrough daemon
Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2019-09-16 09:03:03 CST; 191ms ago
Docs: man:tcmu-runner(8)
Main PID: 1908 (tcmu-runner)
CGroup: /system.slice/tcmu-runner.service
└─1908 /usr/bin/tcmu-runner
Sep 16 09:03:03 rhel2 systemd[1]: Stopping LIO Userspace-passthrough daemon...
Sep 16 09:03:03 rhel2 systemd[1]: Starting LIO Userspace-passthrough daemon...
Sep 16 09:03:03 rhel2 tcmu-runner[1908]: log file path now is '/var/log/tcmu-runner.log'
Sep 16 09:03:03 rhel2 tcmu-runner[1908]: Starting...
Sep 16 09:03:03 rhel2 systemd[1]: Started LIO Userspace-passthrough daemon.
● gluster-blockd.service - Gluster block storage utility
Loaded: loaded (/usr/lib/systemd/system/gluster-blockd.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2019-09-16 09:03:03 CST; 18ms ago
Main PID: 1952 (gluster-blockd)
CGroup: /system.slice/gluster-blockd.service
├─1952 /usr/sbin/gluster-blockd --glfs-lru-count 5 --log-level INFO
└─1955 /usr/sbin/gluster-blockd --glfs-lru-count 5 --log-level INFO
Sep 16 09:03:03 rhel2 systemd[1]: Started Gluster block storage utility.
Sep 16 09:03:03 rhel2 systemd[1]: Starting Gluster block storage utility...
# targetcli ls
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 0]
| o- fileio ................................................................................................. [Storage Objects: 0]
| o- pscsi .................................................................................................. [Storage Objects: 0]
| o- ramdisk ................................................................................................ [Storage Objects: 0]
| o- user:fbo ............................................................................................... [Storage Objects: 0]
| o- user:glfs .............................................................................................. [Storage Objects: 1]
| | o- block0 ........................... [repvol@localhost/block-store/5bf22bfa-965a-48c3-8a43-f0d5018e9303 (100.0MiB) activated]
| | o- alua ................................................................................................... [ALUA Groups: 3]
| | o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
| | o- glfs_tg_pt_gp_ano .................................................................. [ALUA state: Active/non-optimized]
| | o- glfs_tg_pt_gp_ao ....................................................................... [ALUA state: Active/optimized]
| o- user:qcow .............................................................................................. [Storage Objects: 0]
| o- user:zbc ............................................................................................... [Storage Objects: 0]
o- iscsi ............................................................................................................ [Targets: 1]
| o- iqn.2016-12.org.gluster-block:5bf22bfa-965a-48c3-8a43-f0d5018e9303 ................................................ [TPGs: 3]
| o- tpg1 ........................................................................................................... [disabled]
| | o- acls .......................................................................................................... [ACLs: 0]
| | o- luns .......................................................................................................... [LUNs: 1]
| | | o- lun0 ................................................................................. [user/block0 (glfs_tg_pt_gp_ao)]
| | o- portals .................................................................................................... [Portals: 1]
| | o- 192.168.195.162:3260 ............................................................................................. [OK]
| o- tpg2 .................................................................................................. [gen-acls, no-auth]
| | o- acls .......................................................................................................... [ACLs: 0]
| | o- luns .......................................................................................................... [LUNs: 1]
| | | o- lun0 ................................................................................ [user/block0 (glfs_tg_pt_gp_ano)]
| | o- portals .................................................................................................... [Portals: 1]
| | o- 192.168.195.163:3260 ............................................................................................. [OK]
| o- tpg3 ........................................................................................................... [disabled]
| o- acls .......................................................................................................... [ACLs: 0]
| o- luns .......................................................................................................... [LUNs: 1]
| | o- lun0 ................................................................................ [user/block0 (glfs_tg_pt_gp_ano)]
| o- portals .................................................................................................... [Portals: 1]
| o- 192.168.195.164:3260 ............................................................................................. [OK]
o- loopback ......................................................................................................... [Targets: 0]
o- vhost ............................................................................................................ [Targets: 0]
o- xen-pvscsi ....................................................................................................... [Targets: 0]
#
Did I miss something here ?
Did I miss something here ?
I will check on this, looks like a bug in targetcli for me for now. I will fix if I get to see the same issue with my testing today. Thanks!
@pkalever
Test it again and found one new problem.
I have 3 nodes, named as rhel1, rhel2, rhel3.
In the rhel3 I have enabled the targetclid.service and in this node the reload will always success, no matter in which node to run the 'gluster-block reload ..' command.
That means with the targetclid service running it will work.
But if I disable targetlicd service when the node is booting, and then manually start it, mostly the reload won't work for me.
That means we must enable and start the targetclid service when booting the node.
Before I assumed that the targetclid service shouldn't affect the reload feature, is that right ?
Thanks
@pkalever
[...]
But if I disable targetlicd service when the node is booting, and then manually start it, mostly the reload won't work for me.
That means we must enable and start the targetclid service when booting the node.
Before I assumed that the targetclid service shouldn't affect the reload feature, is that right ?
I didn't understand the problem that you are seeing completely. But it shouldn't be a problem if you enable/disable targetclid service.
The gluster-blockd.service has "Wants=targetclid.service" already with PR#203, if its started it will use daemon else not.
Can you detail the steps with which you are seeing the issue, please note I have just update PR#203, please test this PR with latest PR#203.
Thanks!
@pkalever
[...]
But if I disable targetlicd service when the node is booting, and then manually start it, mostly the reload won't work for me. That means we must enable and start the targetclid service when booting the node. Before I assumed that the targetclid service shouldn't affect the reload feature, is that right ?
I didn't understand the problem that you are seeing completely. But it shouldn't be a problem if you enable/disable targetclid service.
The gluster-blockd.service has "Wants=targetclid.service" already with PR#203, if its started it will use daemon else not.
Can you detail the steps with which you are seeing the issue, please note I have just update PR#203, please test this PR with latest PR#203.
Thanks!
I meant that the targetclid here is a must to run the reload. Or it will fail.
From my test: 1, systemctl enable targetclid 2, cp /etc/target/saveconfig.json /tmp/ && rm /etc/target/saveconfig.json 3, reboot the node 4, run gluster-block reload and it will work for me
But if: 1, systemctl disable targetclid 2, cp /etc/target/saveconfig.json /tmp/ && rm /etc/target/saveconfig.json 3, reboot the node 4, systemctl start targtclid 5, run gluster-block reload and mostly it does not work well
Here if without Step4, it is 100% won't work for me, with it, sometimes will work.
BTW, if user do not want the targetclid service mode, the reload won't work then.
Thanks. BRs
@lxbsz thanks for the detailed steps. Have you test it with the latest PR#203 ?
If it still doesn't work for you, please let me know, also please let me know in which step did you start gluster-blockd in both cases.
Thanks!
@lxbsz thanks for the detailed steps. Have you test it with the latest PR#203 ?
If it still doesn't work for you, please let me know, also please let me know in which step did you start gluster-blockd in both cases.
Two nodes work now, and there still one not works:
# gluster-block reload repvol/block0
FAILED ON: 192.168.195.163
SUCCESSFUL ON: 192.168.195.162 192.168.195.164
RESULT: FAIL
I am not sure where there has something I missed. All the code are the same.
@lxbsz thanks for the detailed steps. Have you test it with the latest PR#203 ?
If it still doesn't work for you, please let me know, also please let me know in which step did you start gluster-blockd in both cases.
gluster-blockd is start by the systemd when the node is starting, not manually.
@lxbsz thanks for the detailed steps. Have you test it with the latest PR#203 ? If it still doesn't work for you, please let me know, also please let me know in which step did you start gluster-blockd in both cases.
Two nodes work now, and there still one not works:
# gluster-block reload repvol/block0 FAILED ON: 192.168.195.163 SUCCESSFUL ON: 192.168.195.162 192.168.195.164 RESULT: FAIL
I am not sure where there has something I missed. All the code are the same.
Sep 26 16:01:55 rhel2 gluster-blockd[1416]: restore_from_file() takes at most 4 arguments (5 given)
Hit this log from the 192.168.195.163 node.
Sep 26 16:01:55 rhel2 gluster-blockd[1416]: restore_from_file() takes at most 4 arguments (5 given)
@lxbsz This indicates your targetcli/rtslib is not updated on 192.168.195.163
Sep 26 16:01:55 rhel2 gluster-blockd[1416]: restore_from_file() takes at most 4 arguments (5 given)
@lxbsz This indicates your targetcli/rtslib is not updated on 192.168.195.163
rtslib-fb]# git log
commit 48d4437e42666df6348564ee21d03f29b8a8c48b
Author: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Date: Wed Sep 25 14:56:23 2019 +0530
restoreconfig: fix skipping of targets [re]loading
Problem:
'targetcli restoreconfig [savefile] [target=...]' works for first target only
in the saveconfig.json and fails/skips for others silently.
Solution:
Just an indentation fix, yet very severe.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
commit 2b160b754d48d5dfdfe1d41089d4e9af24ba3b29
Author: Maurizio Lombardi <mlombard@redhat.com>
Date: Mon Aug 26 12:06:22 2019 +0200
version 2.1.70
rtslib-fb]# rm /usr/lib/python2.7/site-packages/rtslib_fb-2.1.70-py2.7.egg
rm: remove regular file ‘/usr/lib/python2.7/site-packages/rtslib_fb-2.1.70-py2.7.egg’? y
rtslib-fb]# ./setup.py install
running install
running bdist_egg
[...]
targetcli-fb]# git log
commit 2a71cece17fcdd40bc022ed1fc009e6f5b9415e8
Author: Maurizio Lombardi <mlombard@redhat.com>
Date: Mon Aug 26 12:10:40 2019 +0200
version 2.1.50
commit 2a94314b7b131141fba885864597b3fb20af1f27
Merge: a9771b1 26b7df6
Author: Maurizio Lombardi <mlombard@redhat.com>
Date: Mon Aug 26 09:51:26 2019 +0200
Merge pull request #144 from pkalever/reload-single-so-tg
[targetcli] restoreconfig: add ability to restore/reload single target or storage_object
targetcli-fb]# rm /usr/lib/python2.7/site-packages/targetcli_fb-2.1.50-py2.7.egg
rm: remove regular file ‘/usr/lib/python2.7/site-packages/targetcli_fb-2.1.50-py2.7.egg’? y
targetcli-fb]# ./setup.py install
running install
running bdist_egg
[...]
● targetclid.service - Targetcli daemon
Loaded: loaded (/usr/lib/systemd/system/targetclid.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2019-09-26 15:53:08 CST; 1h 3min ago
Main PID: 1011 (targetclid)
CGroup: /system.slice/targetclid.service
└─1011 /usr/bin/python /usr/bin/targetclid
Sep 26 15:53:08 rhel2 systemd[1]: Started Targetcli daemon.
Sep 26 15:53:08 rhel2 systemd[1]: Starting Targetcli daemon...
All the 3 nodes are the same as above.
From the logs it seems the install is not correct, could you see the problem here ?
@lxbsz are you expecting anything from me ? I have tested this and it works well for me. Please fix your node, why not pick a fresh node ? (check if you have any rpms installed, etc)
@lxbsz are you expecting anything from me ? I have tested this and it works well for me. Please fix your node, why not pick a fresh node ? (check if you have any rpms installed, etc)
Just tried a new node and it works.
Thanks.
@lxbsz thanks for confirming. Please consider adding your test results to rtslib fix https://github.com/open-iscsi/rtslib-fb/pull/153, Maurizio will be waiting on you.
@lxbsz thanks for confirming. Please consider adding your test results to rtslib fix open-iscsi/rtslib-fb#153, Maurizio will be waiting on you.
Sure.
@lxbsz updated with capabilities and version check support related patches, please check. Thanks!
@pkalever This looks good to me. Thanks
@lxbsz have added man and other doc changes. Also fixed the missing force option with reload.
Requesting a detailed review. Thanks!
@lxbsz please take a look.
@lxbsz updated the tags, merged now. Thanks!
What does this PR achieve? Why do we need it?
Problem:
Right now, if any block volume is failed to load as part of service bringup or node reboot, may be because of an issue from the backend, then there is no way to reload that single block volume, we need to reload/restart all the\ block volumes present in the node just to load one block volume, this interrupts ongoing I/O (via this path) for all the volumes hosted within the node.
Solution:
Add ability to reload a single block volume without touching other block volumes in the node.
$ gluster-block help usage: gluster-block [timeout] <volname[/blockname]> [] [--json*]
commands: [...] reload <volname/blockname> reload a block device. [...]
Notes for the reviewer
TODO:
Signed-off-by: Prasanna Kumar Kalever prasanna.kalever@redhat.com