ceph / ceph-iscsi-cli

NOTICE: moved to https://github.com/ceph/ceph-iscsi
GNU General Public License v3.0
25 stars 25 forks source link

rbd-target-api IOError on adding second gateway #9

Open kaeltis opened 7 years ago

kaeltis commented 7 years ago

I just wiped my configuration (deleted gateway.conf from the pool and rebooted the hosts) after already having the iSCSI Gateway working without any issues.

I followed the same steps as I did before - on my first node (node1) I created a target, added the node itself as a gateway and then tried adding a second node (node2) as gateway. It however now errors out every time (even after multiple reboots and deleting gateway.conf from the pool) with the following message:

/iscsi-target...sigw/gateways> create node1 192.168.44.205
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
ok
/iscsi-target...sigw/gateways> create node2 192.168.44.206
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
ValueError: No JSON object could be decoded

rbd-target-api on node2 shows the following:

Aug 24 03:39:44 node2.ceph.xxxx.com rbd-target-api[20832]: Started the configuration object watcher
Aug 24 03:39:44 node2.ceph.xxxx.com rbd-target-api[20832]: Checking for config object changes every 1s
Aug 24 03:39:44 node2.ceph.xxxx.com rbd-target-api[20832]:  * Running on http://0.0.0.0:5000/
Aug 24 03:39:57 node2.ceph.xxxx.com rbd-target-api[20832]: Change detected - internal 1 / xattr 2 refreshing
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: 192.168.44.205 - - [24/Aug/2017 03:40:02] "GET /api/sysinfo/ipv4_addresses HTTP/1.1" 200 -
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: 192.168.44.205 - - [24/Aug/2017 03:40:02] "GET /api/sysinfo/checkconf HTTP/1.1" 200 -
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: 192.168.44.205 - - [24/Aug/2017 03:40:02] "GET /api/sysinfo/checkversions HTTP/1.1" 200 -
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: 192.168.44.205 - - [24/Aug/2017 03:40:02] "PUT /api/_gateway/node2 HTTP/1.1" 500 -
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: Traceback (most recent call last):
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: return self.wsgi_app(environ, start_response)
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: response = self.make_response(self.handle_exception(e))
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: reraise(exc_type, exc_value, tb)
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: response = self.full_dispatch_request()
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: rv = self.handle_user_exception(e)
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: reraise(exc_type, exc_value, tb)
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: rv = self.dispatch_request()
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: return self.view_functions[rule.endpoint](**req.view_args)
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/bin/rbd-target-api", line 94, in decorated
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: return f(*args, **kwargs)
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/bin/rbd-target-api", line 475, in _gateway
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: gateway.manage(target_mode)
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/ceph_iscsi_config/gateway.py", line 420, in manage
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: self.load_config()
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/ceph_iscsi_config/gateway.py", line 256, in load_config
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: lio_root = root.RTSRoot()
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/rtslib_fb/root.py", line 85, in __init__
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: self._set_dbroot_if_needed()
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/rtslib_fb/root.py", line 167, in _set_dbroot_if_needed
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: fwrite(dbroot_path, self._preferred_dbroot+"\n")
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: File "/usr/lib/python2.7/site-packages/rtslib_fb/utils.py", line 79, in fwrite
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: file_fd.write(str(string))
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: IOError: [Errno 22] Invalid argument
Aug 24 03:40:02 node2.ceph.xxxx.com rbd-target-api[20832]: Change detected - internal 2 / xattr 3 refreshing
pcuzner commented 7 years ago

The error looks like it's from below the iscsi userland stuff in rtslib_fb which is weird. So lets take a look at the purge process and make sure we're clean before we dig deeper. The message seems to imply that sysfs is not clean.

things to check targetcli ls --> should show no entries - it it does run clearconfig config=true /etc/target should not have a saveconfig.json file(it there is something there, rename it out of the way target service should be disabled/masked - iscsi state is restored by rbd-target-gw, so this is svc is no longer needed

The ansible code provides a purge playbook, and the gwcli provides a clearconfig command(you have to drop the config down to just the gateway entries though for this to work atm)

mikechristie commented 7 years ago

Did you use rtslib fb from git or a distro package or upstream release?

There were a couple bug fixes for the dbroot code you seem to be crashing on. I think one change caused a regression but it was fixed in a later commit.

I'm actually in vacation so it's kind of hard to debug from my phone. If you are just pulled rtslib fb from their GitHub could you recline and resinstall?

kaeltis commented 7 years ago

@pcuzner targetcli config is empty and service is masked, I deleted the .gwcli folder from my users home and via "rados -p rbd rm gateway.conf" the config from the pool after stopping the iscsi-target-api/gw and before rebooting the hosts.

After the reboot I checked that file and it was correctly initialized with an empty template and the iscsi-target-api/gw services were correctly started on all nodes and detected the empty/template configuration.

@mikechristie At first I installed the "python-rtslib" version "2.1.fb57-5.el7" from distro packages.

Since that version didn't work in my initial tests, I versionlocked it with yum and replaced it via "pip install --upgrade rtslib-fb" to still fulfill any package dependencies. pip correctly detected the installed version and replaced it with "rtslib-fb-2.1.64" and it worked without any issues before I manually wiped the config.


I noticed something else, dmesg shows the following on node2 after every successive try to add it if I don't reboot it in between: db_root: cannot be changed: target drivers registered

pcuzner commented 7 years ago

What's the OS and kernel version you're using?

kaeltis commented 7 years ago

CentOS 7.3.1611 and Kernel 4.13.0-rc6 with these patches: (compiled with .conf from 3.10.0-693.1.1.el7.x86_64 and "make olddefconfig")

https://patchwork.kernel.org/patch/9894979/ https://patchwork.kernel.org/patch/9894977/ https://patchwork.kernel.org/patch/9897535/ https://patchwork.kernel.org/patch/9897537/ https://patchwork.kernel.org/patch/9903265/


I just noticed that the CentOS default linux-firmware comes with an outdated intel-microcode for 4.13.0-rc6, dmesg complains:

[    0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0xb000020 (or later)
[    1.331008] microcode: sig=0x406f1, pf=0x1, revision=0xb00001d
[    1.331101] microcode: Microcode Update Driver: v2.2.

Not sure if that's a potential issue - should I update linux-firmware as well from the kernel git?


Updated linux-firmware and microcode with the most up to date files from the centos git, the TSC_DEADLINE message is gone now.

kaeltis commented 7 years ago

Did another attempt, this time I was able to add two of three gateways instead of only one. Now I didn't get the error when adding node2, but when adding node3:

/iscsi-target....ceph:iscsigw> cd gateways
/iscsi-target...sigw/gateways> create node1 192.168.44.205
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
ok
/iscsi-target...sigw/gateways> create node2 192.168.44.206
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
ok
/iscsi-target...sigw/gateways> create node3 192.168.44.207
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
ValueError: No JSON object could be decoded

gateway.conf after the error (node3 only got added half-way with its IP to node1 gateway?):

{
    "clients": {},
    "created": "2017/08/24 11:11:47",
    "disks": {},
    "epoch": 5,
    "gateways": {
        "created": "2017/08/24 20:08:29",
        "ip_list": [
            "192.168.44.205",
            "192.168.44.206"
        ],
        "iqn": "iqn.2017-01.com.xxx.ceph:iscsigw",
        "node1": {
            "active_luns": 0,
            "created": "2017/08/24 20:08:43",
            "gateway_ip_list": [
                "192.168.44.205",
                "192.168.44.206",
                "192.168.44.207"
            ],
            "inactive_portal_ips": [
                "192.168.44.206",
                "192.168.44.207"
            ],
            "iqn": "iqn.2017-01.com.xxx.ceph:iscsigw",
            "portal_ip_address": "192.168.44.205",
            "tpgs": 3,
            "updated": "2017/08/24 20:08:53"
        },
        "node2": {
            "active_luns": 0,
            "created": "2017/08/24 20:08:50",
            "gateway_ip_list": [
                "192.168.44.205",
                "192.168.44.206"
            ],
            "inactive_portal_ips": [
                "192.168.44.205"
            ],
            "iqn": "iqn.2017-01.com.xxx.ceph:iscsigw",
            "portal_ip_address": "192.168.44.206",
            "tpgs": 2,
            "updated": "2017/08/24 20:08:50"
        }
    },
    "groups": {},
    "updated": "2017/08/24 20:08:53",
    "version": 3
}

rbd-target-api log on node1:

Aug 24 21:29:43 node1.ceph.xxx.com systemd[1]: Started Ceph iscsi target configuration API.
Aug 24 21:29:43 node1.ceph.xxx.com systemd[1]: Starting Ceph iscsi target configuration API...
Aug 24 21:29:48 node1.ceph.xxx.com rbd-target-api[1555]: Started the configuration object watcher
Aug 24 21:29:48 node1.ceph.xxx.com rbd-target-api[1555]: Checking for config object changes every 1s
Aug 24 21:29:48 node1.ceph.xxx.com rbd-target-api[1555]:  * Running on http://0.0.0.0:5000/
Aug 24 22:07:54 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:07:54] "GET /api/config HTTP/1.1" 200 -
Aug 24 22:08:04 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:04] "GET /api/config HTTP/1.1" 200 -
Aug 24 22:08:29 node1.ceph.xxx.com rbd-target-api[1555]: (Gateway.create_target) created an iscsi target with iqn of 'iqn.2017-01.com.xxx.ceph:iscsigw'
Aug 24 22:08:29 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:29] "PUT /api/target/iqn.2017-01.com.xxx.ceph:iscsigw HTTP/1.1" 200 -
Aug 24 22:08:30 node1.ceph.xxx.com rbd-target-api[1555]: Change detected - internal 0 / xattr 1 refreshing
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:43] "GET /api/config HTTP/1.1" 200 -
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: 192.168.44.205 - - [24/Aug/2017 22:08:43] "GET /api/sysinfo/ipv4_addresses HTTP/1.1" 200 -
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: 192.168.44.205 - - [24/Aug/2017 22:08:43] "GET /api/sysinfo/checkconf HTTP/1.1" 200 -
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: 192.168.44.205 - - [24/Aug/2017 22:08:43] "GET /api/sysinfo/checkversions HTTP/1.1" 200 -
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: (Gateway.load_config) successfully loaded existing target definition
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: An additional 1 tpg's are required
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: (Gateway.create_target) created an iscsi target with iqn of 'iqn.2017-01.com.xxx.ceph:iscsigw'
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: created the gateway
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: refreshing the configuration after the gateway creation
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:43] "PUT /api/_gateway/node1 HTTP/1.1" 200 -
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: _gateway update on 127.0.0.1, successful
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:43] "PUT /api/gateway/node1 HTTP/1.1" 200 -
Aug 24 22:08:43 node1.ceph.xxx.com rbd-target-api[1555]: 192.168.44.205 - - [24/Aug/2017 22:08:43] "GET /api/config HTTP/1.1" 200 -
Aug 24 22:08:49 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:49] "GET /api/config HTTP/1.1" 200 -
Aug 24 22:08:49 node1.ceph.xxx.com rbd-target-api[1555]: (Gateway.load_config) successfully loaded existing target definition
Aug 24 22:08:49 node1.ceph.xxx.com rbd-target-api[1555]: An additional 1 tpg's are required
Aug 24 22:08:49 node1.ceph.xxx.com rbd-target-api[1555]: (Gateway.create_target) created an iscsi target with iqn of 'iqn.2017-01.com.xxx.ceph:iscsigw'
Aug 24 22:08:49 node1.ceph.xxx.com rbd-target-api[1555]: created the gateway
Aug 24 22:08:49 node1.ceph.xxx.com rbd-target-api[1555]: refreshing the configuration after the gateway creation
Aug 24 22:08:49 node1.ceph.xxx.com rbd-target-api[1555]: 192.168.44.205 - - [24/Aug/2017 22:08:49] "PUT /api/_gateway/node2 HTTP/1.1" 200 -
Aug 24 22:08:49 node1.ceph.xxx.com rbd-target-api[1555]: _gateway update on 192.168.44.205, successful
Aug 24 22:08:50 node1.ceph.xxx.com rbd-target-api[1555]: _gateway update on 192.168.44.206, successful
Aug 24 22:08:50 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:50] "PUT /api/gateway/node2 HTTP/1.1" 200 -
Aug 24 22:08:50 node1.ceph.xxx.com rbd-target-api[1555]: Change detected - internal 3 / xattr 4 refreshing
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:53] "GET /api/config HTTP/1.1" 200 -
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: (Gateway.load_config) successfully loaded existing target definition
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: An additional 1 tpg's are required
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: (Gateway.create_target) created an iscsi target with iqn of 'iqn.2017-01.com.xxx.ceph:iscsigw'
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: created the gateway
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: refreshing the configuration after the gateway creation
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: 192.168.44.205 - - [24/Aug/2017 22:08:53] "PUT /api/_gateway/node3 HTTP/1.1" 200 -
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: _gateway update on 192.168.44.205, successful
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: _gateway change on 192.168.44.206 failed with 500
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: 127.0.0.1 - - [24/Aug/2017 22:08:53] "PUT /api/gateway/node3 HTTP/1.1" 500 -
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: Traceback (most recent call last):
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: return self.wsgi_app(environ, start_response)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: response = self.make_response(self.handle_exception(e))
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: reraise(exc_type, exc_value, tb)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: response = self.full_dispatch_request()
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: rv = self.handle_user_exception(e)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: reraise(exc_type, exc_value, tb)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: rv = self.dispatch_request()
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: return self.view_functions[rule.endpoint](**req.view_args)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/bin/rbd-target-api", line 94, in decorated
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: return f(*args, **kwargs)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/bin/rbd-target-api", line 297, in gateway
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: api_vars=api_vars)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/bin/rbd-target-api", line 1282, in call_api
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: fail_msg += api.response.json()['message']
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib/python2.7/site-packages/requests/models.py", line 802, in json
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: return json.loads(self.text, **kwargs)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: return _default_decoder.decode(s)
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: raise ValueError("No JSON object could be decoded")
Aug 24 22:08:53 node1.ceph.xxx.com rbd-target-api[1555]: ValueError: No JSON object could be decoded

rbd-target-api log on node2:

Aug 24 21:41:25 node2.ceph.xxx.com systemd[1]: Started Ceph iscsi target configuration API.
Aug 24 21:41:25 node2.ceph.xxx.com systemd[1]: Starting Ceph iscsi target configuration API...
Aug 24 21:41:30 node2.ceph.xxx.com rbd-target-api[1565]: Started the configuration object watcher
Aug 24 21:41:30 node2.ceph.xxx.com rbd-target-api[1565]: Checking for config object changes every 1s
Aug 24 21:41:30 node2.ceph.xxx.com rbd-target-api[1565]:  * Running on http://0.0.0.0:5000/
Aug 24 22:08:30 node2.ceph.xxx.com rbd-target-api[1565]: Change detected - internal 0 / xattr 1 refreshing
Aug 24 22:08:44 node2.ceph.xxx.com rbd-target-api[1565]: Change detected - internal 1 / xattr 2 refreshing
Aug 24 22:08:49 node2.ceph.xxx.com rbd-target-api[1565]: 192.168.44.205 - - [24/Aug/2017 22:08:49] "GET /api/sysinfo/ipv4_addresses HTTP/1.1" 200 -
Aug 24 22:08:49 node2.ceph.xxx.com rbd-target-api[1565]: 192.168.44.205 - - [24/Aug/2017 22:08:49] "GET /api/sysinfo/checkconf HTTP/1.1" 200 -
Aug 24 22:08:49 node2.ceph.xxx.com rbd-target-api[1565]: 192.168.44.205 - - [24/Aug/2017 22:08:49] "GET /api/sysinfo/checkversions HTTP/1.1" 200 -
Aug 24 22:08:50 node2.ceph.xxx.com rbd-target-api[1565]: (Gateway.create_target) created an iscsi target with iqn of 'iqn.2017-01.com.xxx.ceph:iscsigw'
Aug 24 22:08:50 node2.ceph.xxx.com rbd-target-api[1565]: (Gateway.create_target) created an iscsi target with iqn of 'iqn.2017-01.com.xxx.ceph:iscsigw'
Aug 24 22:08:50 node2.ceph.xxx.com rbd-target-api[1565]: (Gateway.create_target) created an iscsi target with iqn of 'iqn.2017-01.com.xxx.ceph:iscsigw'
Aug 24 22:08:50 node2.ceph.xxx.com rbd-target-api[1565]: created the gateway
Aug 24 22:08:50 node2.ceph.xxx.com rbd-target-api[1565]: refreshing the configuration after the gateway creation
Aug 24 22:08:50 node2.ceph.xxx.com rbd-target-api[1565]: 192.168.44.205 - - [24/Aug/2017 22:08:50] "PUT /api/_gateway/node2 HTTP/1.1" 200 -
Aug 24 22:08:50 node2.ceph.xxx.com rbd-target-api[1565]: 192.168.44.205 - - [24/Aug/2017 22:08:50] "GET /api/config HTTP/1.1" 200 -
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: 192.168.44.205 - - [24/Aug/2017 22:08:53] "PUT /api/_gateway/node3 HTTP/1.1" 500 -
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: Traceback (most recent call last):
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: return self.wsgi_app(environ, start_response)
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: response = self.make_response(self.handle_exception(e))
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: reraise(exc_type, exc_value, tb)
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: response = self.full_dispatch_request()
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: rv = self.handle_user_exception(e)
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: reraise(exc_type, exc_value, tb)
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: rv = self.dispatch_request()
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: return self.view_functions[rule.endpoint](**req.view_args)
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/bin/rbd-target-api", line 94, in decorated
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: return f(*args, **kwargs)
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/bin/rbd-target-api", line 475, in _gateway
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: gateway.manage(target_mode)
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/ceph_iscsi_config/gateway.py", line 420, in manage
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: self.load_config()
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/ceph_iscsi_config/gateway.py", line 256, in load_config
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: lio_root = root.RTSRoot()
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/rtslib_fb/root.py", line 85, in __init__
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: self._set_dbroot_if_needed()
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/rtslib_fb/root.py", line 167, in _set_dbroot_if_needed
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: fwrite(dbroot_path, self._preferred_dbroot+"\n")
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: File "/usr/lib/python2.7/site-packages/rtslib_fb/utils.py", line 79, in fwrite
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: file_fd.write(str(string))
Aug 24 22:08:53 node2.ceph.xxx.com rbd-target-api[1565]: IOError: [Errno 22] Invalid argument
Aug 24 22:08:54 node2.ceph.xxx.com rbd-target-api[1565]: Change detected - internal 4 / xattr 5 refreshing

rbd-target-api log on node3:

Aug 24 21:56:06 node3.ceph.xxx.com systemd[1]: Started Ceph iscsi target configuration API.
Aug 24 21:56:06 node3.ceph.xxx.com systemd[1]: Starting Ceph iscsi target configuration API...
Aug 24 21:56:07 node3.ceph.xxx.com rbd-target-api[1720]: Started the configuration object watcher
Aug 24 21:56:08 node3.ceph.xxx.com rbd-target-api[1720]: Checking for config object changes every 1s
Aug 24 21:56:08 node3.ceph.xxx.com rbd-target-api[1720]:  * Running on http://0.0.0.0:5000/
Aug 24 22:08:30 node3.ceph.xxx.com rbd-target-api[1720]: Change detected - internal 0 / xattr 1 refreshing
Aug 24 22:08:44 node3.ceph.xxx.com rbd-target-api[1720]: Change detected - internal 1 / xattr 2 refreshing
Aug 24 22:08:50 node3.ceph.xxx.com rbd-target-api[1720]: Change detected - internal 2 / xattr 4 refreshing
Aug 24 22:08:53 node3.ceph.xxx.com rbd-target-api[1720]: 192.168.44.205 - - [24/Aug/2017 22:08:53] "GET /api/sysinfo/ipv4_addresses HTTP/1.1" 200 -
Aug 24 22:08:53 node3.ceph.xxx.com rbd-target-api[1720]: 192.168.44.205 - - [24/Aug/2017 22:08:53] "GET /api/sysinfo/checkconf HTTP/1.1" 200 -
Aug 24 22:08:53 node3.ceph.xxx.com rbd-target-api[1720]: 192.168.44.205 - - [24/Aug/2017 22:08:53] "GET /api/sysinfo/checkversions HTTP/1.1" 200 -
Aug 24 22:08:53 node3.ceph.xxx.com rbd-target-api[1720]: Change detected - internal 4 / xattr 5 refreshing

According to the logs it seems node2 is causing the error when adding node3

When creating the gateway.conf file manually and pushing it to the pool it seems to work without any further issues.

mikechristie commented 7 years ago

Just adding note in case someone stumbles on this for the dbroot errors. I think it might be fixed with this pr for rtslib fb

https://github.com/open-iscsi/rtslib-fb/pull/106

I'm still on vacation (my first vacation in like 3 years :)) so I have not had a chance to verify.