Seagate / cortx-prvsnr

CORTX Provisioner offers a framework which accepts configurations (cluster.yaml and config.yaml) in the form of ConfigMap, translates into internal configuration (CORTX Conf Store) and then orchestrates across components mini provisioners to allow them to configure services. In Kubernetes environment, CORTX Provisioner framework runs on all the CORTX PODs (in a separate one time init container).
https://github.com/Seagate/cortx
GNU Affero General Public License v3.0
17 stars 40 forks source link

Multi-node Setup Error #767

Closed PengMacro closed 3 years ago

PengMacro commented 3 years ago

I am trying to test multi-node but got error when run "provisioner" command. Here is my config.ini and setup.log.

I used the network alias ifcfg-eth0:0, ifcfg-eth0:1, and ifcfg-eth0:2 so that I can have 3x ip address on the same subnet. You can check here for those ifcfg files. Please let me know if you need more infomation. Thanks.

johnbent commented 3 years ago

Plus @huanghua78 and @ipoddubnyy and @suntins .

johnbent commented 3 years ago

To save folks a click, here is the reported error:

2020-11-21 05:01:34,150 - ERROR - provisioner failed {'srvnode-1': '\x1b[0;31msrvnode-3:\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Rescan SCSI\x1b[0;0m\n \x1b[0;36mFunction: module.run\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: scsi.rescan_all: [\'- - - > /sys/class/scsi_host/host0/scan\']\x1b[0;0m\n \x1b[0;36m Started: 05:01:10.432178\x1b[0;0m\n \x1b[0;36mDuration: 9.31 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mscsi.rescan_all\x1b[0;0m:\n \x1b[0;32m- - - - > /sys/class/scsi_host/host0/scan\x1b[0;0m\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Install multipath\x1b[0;0m\n \x1b[0;36mFunction: pkg.installed\x1b[0;0m\n \x1b[0;36m Name: device-mapper-multipath\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: 1 targeted package was installed/updated.\x1b[0;0m\n \x1b[0;36m Started: 05:01:11.806885\x1b[0;0m\n \x1b[0;36mDuration: 7756.404 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mdevice-mapper-multipath\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;36mdevice-mapper-multipath-libs\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;36mkpartx\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;32m0.4.9-131.el7\x1b[0;0m\x1b[0;0m\n \x1b[0;32m Name: multipathd.service - Function: service.dead - Result: Clean Started: - 05:01:19.574897 Duration: 30.234 ms\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Copy multipath config\x1b[0;0m\n \x1b[0;36mFunction: file.managed\x1b[0;0m\n \x1b[0;36m Name: /etc/multipath.conf\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: File /etc/multipath.conf updated\x1b[0;0m\n \x1b[0;36m Started: 05:01:19.609474\x1b[0;0m\n \x1b[0;36mDuration: 75.884 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mdiff\x1b[0;0m:\n \x1b[0;32mNew file\x1b[0;0m\n \x1b[0;36mmode\x1b[0;0m:\n \x1b[0;32m0644\x1b[0;0m\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Start multipath service\x1b[0;0m\n \x1b[0;36mFunction: service.running\x1b[0;0m\n \x1b[0;36m Name: multipathd.service\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: Service multipathd.service is already enabled, and is running\x1b[0;0m\n \x1b[0;36m Started: 05:01:19.686009\x1b[0;0m\n \x1b[0;36mDuration: 90.924 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mmultipathd.service\x1b[0;0m:\n \x1b[0;1;33mTrue\x1b[0;0m\x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Check multipath devices\x1b[0;0m\n \x1b[0;31mFunction: cmd.run\x1b[0;0m\n \x1b[0;31m Name: test multipath -ll | grep mpath | wc -l -ge 7\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: Attempt 1: Returned a result of "False", with the following comment: "Command "test multipath -ll | grep mpath | wc -l -ge 7" run"\n Attempt 2: Returned a result of "False", with the following comment: "Command "test multipath -ll | grep mpath | wc -l -ge 7" run"\n Command "test multipath -ll | grep mpath | wc -l -ge 7" run\x1b[0;0m\n \x1b[0;31m Started: 05:01:19.779738\x1b[0;0m\n \x1b[0;31mDuration: 10050.74 ms\x1b[0;0m\n\x1b[0;31m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mpid\x1b[0;0m:\n \x1b[0;1;33m2193\x1b[0;0m\n \x1b[0;36mretcode\x1b[0;0m:\n \x1b[0;1;33m1\x1b[0;0m\n \x1b[0;36mstderr\x1b[0;0m:\n \x1b[0;36mstdout\x1b[0;0m:\x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Update cluster.sls pillar\x1b[0;0m\n \x1b[0;31mFunction: module.run\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: One or more requisite failed: components.system.storage.multipath.config.Check multipath devices\x1b[0;0m\n \x1b[0;31m Started: 05:01:29.846554\x1b[0;0m\n \x1b[0;31mDuration: 0.003 ms\x1b[0;0m\n\x1b[0;31m Changes: \x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Restart service multipath\x1b[0;0m\n \x1b[0;31mFunction: module.run\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: One or more requisite failed: components.system.storage.multipath.config.Update cluster.sls pillar\x1b[0;0m\n \x1b[0;31m Started: 05:01:29.846900\x1b[0;0m\n \x1b[0;31mDuration: 0.002 ms\x1b[0;0m\n\x1b[0;31m Changes: \x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Generate multipath checkpoint flag\x1b[0;0m\n \x1b[0;36mFunction: file.managed\x1b[0;0m\n \x1b[0;36m Name: /opt/seagate/cortx/provisioner/generated_configs/srvnode-3.multipath\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: Empty file\x1b[0;0m\n \x1b[0;36m Started: 05:01:29.846955\x1b[0;0m\n \x1b[0;36mDuration: 2.878 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32mfile /opt/seagate/cortx/provisioner/generated_configs/srvnode-3.multipath created\x1b[0;0m\x1b[0;0m\n\x1b[0;36m\nSummary for srvnode-3\n------------\x1b[0;0m\n\x1b[0;32mSucceeded: 6\x1b[0;0m (\x1b[0;32mchanged=6\x1b[0;0m)\n\x1b[0;31mFailed: 3\x1b[0;0m\n\x1b[0;36m------------\nTotal states run: 9\x1b[0;0m\n\x1b[0;36mTotal run time: 18.016 s\x1b[0;0m\n\x1b[0;31msrvnode-2:\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Rescan SCSI\x1b[0;0m\n \x1b[0;36mFunction: module.run\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: scsi.rescan_all: [\'- - - > /sys/class/scsi_host/host0/scan\']\x1b[0;0m\n \x1b[0;36m Started: 05:01:10.414938\x1b[0;0m\n \x1b[0;36mDuration: 16.014 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mscsi.rescan_all\x1b[0;0m:\n \x1b[0;32m- - - - > /sys/class/scsi_host/host0/scan\x1b[0;0m\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Install multipath\x1b[0;0m\n \x1b[0;36mFunction: pkg.installed\x1b[0;0m\n \x1b[0;36m Name: device-mapper-multipath\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: 1 targeted package was installed/updated.\x1b[0;0m\n \x1b[0;36m Started: 05:01:11.786342\x1b[0;0m\n \x1b[0;36mDuration: 8101.19 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mdevice-mapper-multipath\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;36mdevice-mapper-multipath-libs\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;36mkpartx\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;32m0.4.9-131.el7\x1b[0;0m\x1b[0;0m\n \x1b[0;32m Name: multipathd.service - Function: service.dead - Result: Clean Started: - 05:01:19.900880 Duration: 31.415 ms\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Copy multipath config\x1b[0;0m\n \x1b[0;36mFunction: file.managed\x1b[0;0m\n \x1b[0;36m Name: /etc/multipath.conf\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: File /etc/multipath.conf updated\x1b[0;0m\n \x1b[0;36m Started: 05:01:19.936760\x1b[0;0m\n \x1b[0;36mDuration: 142.297 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mdiff\x1b[0;0m:\n \x1b[0;32mNew file\x1b[0;0m\n \x1b[0;36mmode\x1b[0;0m:\n \x1b[0;32m0644\x1b[0;0m\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Start multipath service\x1b[0;0m\n \x1b[0;36mFunction: service.running\x1b[0;0m\n \x1b[0;36m Name: multipathd.service\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: Service multipathd.service is already enabled, and is running\x1b[0;0m\n \x1b[0;36m Started: 05:01:20.079675\x1b[0;0m\n \x1b[0;36mDuration: 87.473 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mmultipathd.service\x1b[0;0m:\n \x1b[0;1;33mTrue\x1b[0;0m\x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Check multipath devices\x1b[0;0m\n \x1b[0;31mFunction: cmd.run\x1b[0;0m\n \x1b[0;31m Name: test multipath -ll | grep mpath | wc -l -ge 7\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: Attempt 1: Returned a result of "False", with the following comment: "Command "test multipath -ll | grep mpath | wc -l -ge 7" run"\n Attempt 2: Returned a result of "False", with the following comment: "Command "test multipath -ll | grep mpath | wc -l -ge 7" run"\n Command "test multipath -ll | grep mpath | wc -l -ge 7" run\x1b[0;0m\n \x1b[0;31m Started: 05:01:20.169764\x1b[0;0m\n \x1b[0;31mDuration: 10045.261999999999 ms\x1b[0;0m\n\x1b[0;31m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mpid\x1b[0;0m:\n \x1b[0;1;33m2604\x1b[0;0m\n \x1b[0;36mretcode\x1b[0;0m:\n \x1b[0;1;33m1\x1b[0;0m\n \x1b[0;36mstderr\x1b[0;0m:\n \x1b[0;36mstdout\x1b[0;0m:\x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Update cluster.sls pillar\x1b[0;0m\n \x1b[0;31mFunction: module.run\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: One or more requisite failed: components.system.storage.multipath.config.Check multipath devices\x1b[0;0m\n \x1b[0;31m Started: 05:01:30.231592\x1b[0;0m\n \x1b[0;31mDuration: 0.004 ms\x1b[0;0m\n\x1b[0;31m Changes: \x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Restart service multipath\x1b[0;0m\n \x1b[0;31mFunction: module.run\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: One or more requisite failed: components.system.storage.multipath.config.Update cluster.sls pillar\x1b[0;0m\n \x1b[0;31m Started: 05:01:30.231953\x1b[0;0m\n \x1b[0;31mDuration: 0.002 ms\x1b[0;0m\n\x1b[0;31m Changes: \x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Generate multipath checkpoint flag\x1b[0;0m\n \x1b[0;36mFunction: file.managed\x1b[0;0m\n \x1b[0;36m Name: /opt/seagate/cortx/provisioner/generated_configs/srvnode-2.multipath\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: Empty file\x1b[0;0m\n \x1b[0;36m Started: 05:01:30.232010\x1b[0;0m\n \x1b[0;36mDuration: 2.937 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32mfile /opt/seagate/cortx/provisioner/generated_configs/srvnode-2.multipath created\x1b[0;0m\x1b[0;0m\n\x1b[0;36m\nSummary for srvnode-2\n------------\x1b[0;0m\n\x1b[0;32mSucceeded: 6\x1b[0;0m (\x1b[0;32mchanged=6\x1b[0;0m)\n\x1b[0;31mFailed: 3\x1b[0;0m\n\x1b[0;36m------------\nTotal states run: 9\x1b[0;0m\n\x1b[0;36mTotal run time: 18.427 s\x1b[0;0mERROR: Minions returned with non-zero exit code\n\n\x1b[0;31msrvnode-1:\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Rescan SCSI\x1b[0;0m\n \x1b[0;36mFunction: module.run\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: scsi.rescan_all: [\'- - - > /sys/class/scsi_host/host0/scan\']\x1b[0;0m\n \x1b[0;36m Started: 05:01:10.442713\x1b[0;0m\n \x1b[0;36mDuration: 10.112 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mscsi.rescan_all\x1b[0;0m:\n \x1b[0;32m- - - - > /sys/class/scsi_host/host0/scan\x1b[0;0m\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Install multipath\x1b[0;0m\n \x1b[0;36mFunction: pkg.installed\x1b[0;0m\n \x1b[0;36m Name: device-mapper-multipath\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: 1 targeted package was installed/updated.\x1b[0;0m\n \x1b[0;36m Started: 05:01:11.877432\x1b[0;0m\n \x1b[0;36mDuration: 10726.972 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mdevice-mapper-multipath\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;36mdevice-mapper-multipath-libs\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;36mkpartx\x1b[0;0m:\n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32m0.4.9-134.el7_9\x1b[0;0m\n \x1b[0;36mold\x1b[0;0m:\n \x1b[0;32m0.4.9-131.el7\x1b[0;0m\x1b[0;0m\n \x1b[0;32m Name: multipathd.service - Function: service.dead - Result: Clean Started: - 05:01:22.617546 Duration: 31.116 ms\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Copy multipath config\x1b[0;0m\n \x1b[0;36mFunction: file.managed\x1b[0;0m\n \x1b[0;36m Name: /etc/multipath.conf\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: File /etc/multipath.conf updated\x1b[0;0m\n \x1b[0;36m Started: 05:01:22.653415\x1b[0;0m\n \x1b[0;36mDuration: 84.529 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mdiff\x1b[0;0m:\n \x1b[0;32mNew file\x1b[0;0m\n \x1b[0;36mmode\x1b[0;0m:\n \x1b[0;32m0644\x1b[0;0m\x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Start multipath service\x1b[0;0m\n \x1b[0;36mFunction: service.running\x1b[0;0m\n \x1b[0;36m Name: multipathd.service\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: Service multipathd.service is already enabled, and is running\x1b[0;0m\n \x1b[0;36m Started: 05:01:22.738704\x1b[0;0m\n \x1b[0;36mDuration: 90.0 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mmultipathd.service\x1b[0;0m:\n \x1b[0;1;33mTrue\x1b[0;0m\x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Check multipath devices\x1b[0;0m\n \x1b[0;31mFunction: cmd.run\x1b[0;0m\n \x1b[0;31m Name: test multipath -ll | grep mpath | wc -l -ge 7\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: Attempt 1: Returned a result of "False", with the following comment: "Command "test multipath -ll | grep mpath | wc -l -ge 7" run"\n Attempt 2: Returned a result of "False", with the following comment: "Command "test multipath -ll | grep mpath | wc -l -ge 7" run"\n Command "test multipath -ll | grep mpath | wc -l -ge 7" run\x1b[0;0m\n \x1b[0;31m Started: 05:01:22.831402\x1b[0;0m\n \x1b[0;31mDuration: 10045.818 ms\x1b[0;0m\n\x1b[0;31m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mpid\x1b[0;0m:\n \x1b[0;1;33m42689\x1b[0;0m\n \x1b[0;36mretcode\x1b[0;0m:\n \x1b[0;1;33m1\x1b[0;0m\n \x1b[0;36mstderr\x1b[0;0m:\n \x1b[0;36mstdout\x1b[0;0m:\x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Update cluster.sls pillar\x1b[0;0m\n \x1b[0;31mFunction: module.run\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: One or more requisite failed: components.system.storage.multipath.config.Check multipath devices\x1b[0;0m\n \x1b[0;31m Started: 05:01:32.894260\x1b[0;0m\n \x1b[0;31mDuration: 0.004 ms\x1b[0;0m\n\x1b[0;31m Changes: \x1b[0;0m\n\x1b[0;31m----------\x1b[0;0m\n \x1b[0;31m ID: Restart service multipath\x1b[0;0m\n \x1b[0;31mFunction: module.run\x1b[0;0m\n \x1b[0;31m Result: False\x1b[0;0m\n \x1b[0;31m Comment: One or more requisite failed: components.system.storage.multipath.config.Update cluster.sls pillar\x1b[0;0m\n \x1b[0;31m Started: 05:01:32.894622\x1b[0;0m\n \x1b[0;31mDuration: 0.003 ms\x1b[0;0m\n\x1b[0;31m Changes: \x1b[0;0m\n\x1b[0;36m----------\x1b[0;0m\n \x1b[0;36m ID: Generate multipath checkpoint flag\x1b[0;0m\n \x1b[0;36mFunction: file.managed\x1b[0;0m\n \x1b[0;36m Name: /opt/seagate/cortx/provisioner/generated_configs/srvnode-1.multipath\x1b[0;0m\n \x1b[0;36m Result: True\x1b[0;0m\n \x1b[0;36m Comment: Empty file\x1b[0;0m\n \x1b[0;36m Started: 05:01:32.894680\x1b[0;0m\n \x1b[0;36mDuration: 2.971 ms\x1b[0;0m\n\x1b[0;36m Changes: \n \x1b[0;36m----------\x1b[0;0m\n \x1b[0;36mnew\x1b[0;0m:\n \x1b[0;32mfile /opt/seagate/cortx/provisioner/generated_configs/srvnode-1.multipath created\x1b[0;0m\x1b[0;0m\n\x1b[0;36m\nSummary for srvnode-1\n------------\x1b[0;0m\n\x1b[0;32mSucceeded: 6\x1b[0;0m (\x1b[0;32mchanged=6\x1b[0;0m)\n\x1b[0;31mFailed: 3\x1b[0;0m\n\x1b[0;36m------------\nTotal states run: 9\x1b[0;0m\n\x1b[0;36mTotal run time: 20.992 s\x1b[0;0m'}

suntins commented 3 years ago

@PengMacro please take a note on the prerequisite hardware requirement. The JBOD setup is required minimum 7x disks.

Please run the command below to check:

multipath -ll | grep mpath | wc -l

johnbent commented 3 years ago

Hello @suntins , since there is a config file, is it possible to run with a different number of disks?

https://github.com/Seagate/cortx/blob/main/doc/scaleout/Configuration_File.rst

suntins commented 3 years ago

I don't think so. Look like the deployment script - SaltStack has checking on this as a prerequisite requirement.

Thank you very much.

Best Regards,

Sakchai S.


From: John Bent notifications@github.com Sent: Wednesday, November 25, 2020 06:14 To: Seagate/cortx cortx@noreply.github.com Cc: Sakchai Suntinuraks sakchai.suntinuraks@seagate.com; Mention mention@noreply.github.com Subject: Re: [Seagate/cortx] Multi-node Setup Error (#481)

Hello @suntinshttps://github.com/suntins , since there is a config file, is it possible to run with a different number of disks?

https://github.com/Seagate/cortx/blob/main/doc/scaleout/Configuration_File.rst

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Seagate/cortx/issues/481#issuecomment-733287004, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APNXUNOZNK3DQ7LVMCJZ6PLSRQ46NANCNFSM4T5TFILQ.

PengMacro commented 3 years ago

Thanks.

I was using 1 disk before and now switch to machines with more than 7 disks. But "multipath -ll | grep mpath | wc -l" outputs "2" in both cases.

Now I get this new error:

[root@s-1 cortx]# /usr/local/bin/provisioner setup_jbod --source iso --iso-cortx cortx-1.0.0-release-43.is o --iso-cortx-deps cortx-1.0.0-prereqs-43.iso --ha --logfile --logfile-filename ./setup.log --config-path config.ini srvnode-1:s-1.novalocal srvnode-2:s-2.novalocal srvnode-3:s-3.novalocal 2020-11-24 20:48:36,145 - INFO - Setup provisioner 2020-11-24 20:48:36,145 - INFO - The type of distribution would be set to DistrType.BUNDLE 2020-11-24 20:48:36,146 - INFO - Starting to build setup 'srvnode-1_root@s-1.novalocal_22srvnode-2_root@ s-2.novalocal_22srvnode-3_root@s-3.novalocal_22' 2020-11-24 20:48:36,148 - INFO - Profile location '/home/cortx/.provisioner/srvnode-1_root@s-1.novalocal_2 2srvnode-2_root@s-2.novalocal_22srvnode-3_root@s-3.novalocal22' 2020-11-24 20:48:36,148 - INFO - Generating setup keys [skipped] 2020-11-24 20:48:36,148 - INFO - Generating a roster file 2020-11-24 20:48:36,163 - INFO - Ensuring 'srvnode-1' is ready to accept commands 2020-11-24 20:48:37,423 - INFO - Ensuring 'srvnode-2' is ready to accept commands 2020-11-24 20:48:38,769 - INFO - Ensuring 'srvnode-3' is ready to accept commands 2020-11-24 20:48:40,013 - INFO - Resolving node grains 2020-11-24 20:48:44,495 - INFO - Preparing salt masters / minions configuration 2020-11-24 20:48:56,247 - INFO - srvnode-1 is reachable from other nodes by: {'10.10.2.15', '10.10.2.12', '10.10.2.14', 'host-10-10-2-15.openstacklocal', '10.10.2.13'} 2020-11-24 20:49:08,718 - INFO - srvnode-2 is reachable from other nodes by: {'10.10.2.60', '10.10.2.57', '10.10.2.59', '10.10.2.58', 'host-10-10-2-60.openstacklocal'} 2020-11-24 20:49:20,675 - INFO - srvnode-3 is reachable from other nodes by: {'10.10.2.23', '10.10.2.25', '10.10.2.26', 'host-10-10-2-26.openstacklocal', '10.10.2.24'} 2020-11-24 20:49:20,676 - INFO - salt masters would be set as follows: {'srvnode-1': ['127.0.0.1', 'host-1 0-10-2-60.openstacklocal', 'host-10-10-2-26.openstacklocal'], 'srvnode-2': ['host-10-10-2-15.openstackloca l', '127.0.0.1', 'host-10-10-2-26.openstacklocal'], 'srvnode-3': ['host-10-10-2-15.openstacklocal', 'host- 10-10-2-60.openstacklocal', '127.0.0.1']} 2020-11-24 20:49:33,761 - INFO - Copying config.ini to file root 2020-11-24 20:49:33,765 - INFO - Preparing CORTX repos pillar 2020-11-24 20:49:33,796 - INFO - Installing Cortx yum repositories 2020-11-24 20:51:39,874 - INFO - Setting up paswordless ssh 2020-11-24 20:51:44,688 - INFO - Checking paswordless ssh 2020-11-24 20:51:49,401 - INFO - Installing SaltStack 2020-11-24 20:52:15,164 - INFO - Installing provisioner from a 'iso' source 2020-11-24 20:52:26,620 - INFO - Configuring salt minions 2020-11-24 20:52:31,640 - INFO - Configuring salt-masters 2020-11-24 20:52:43,179 - INFO - Configuring glusterfs servers 2020-11-24 20:53:00,548 - INFO - Configuring glusterfs cluster 2020-11-24 20:53:05,129 - ERROR - provisioner failed {'srvnode-1': {'glusterfs|-glusterfs_serverspeered|-host-10-10-2-60.openstacklocal|-peered': {'comment ': 'Failed to peer with host-10-10-2-60.openstacklocal, please check logs for errors', 'changes': {}}, 'gl usterfs|-glusterfs_serverspeered|-host-10-10-2-26.openstacklocal|-peered': {'comment': 'Failed to peer with host-10-10-2-26.openstacklocal, please check logs for errors', 'changes': {}}, 'glusterfs|-glusterf s_volume_volume_salt_cache_jobscreated|-volume_salt_cachejobs|-volume_present': {'comment': 'One or mo re requisite failed: glusterfs.cluster.config.glusterfs_serverspeered', 'changes': {}}, 'glusterfs|-glus terfs_volume_volume_prvsnr_datacreated|-volume_prvsnrdata|-volume_present': {'comment': 'One or more r equisite failed: glusterfs.cluster.config.glusterfs_servers_peered', 'changes': {}}}}

Here is the setup.log

Here is the config.ini:

[cluster]
cluster_ip=10.10.2.131
mgmt_vip=10.10.2.132

[storage_enclosure]
type=JBOD

[srvnode-1]
hostname=s-1.novalocal
network.mgmt_nw.iface=eth0
network.mgmt_nw.public_ip_addr=10.10.2.15
network.mgmt_nw.netmask=255.255.255.0
network.mgmt_nw.gateway=10.10.2.1
network.data_nw.iface=eth0:0
network.data_nw.public_ip_addr=10.10.2.12
network.data_nw.netmask=255.255.255.0
network.data_nw.gateway=10.10.2.1
network.data_nw.pvt_ip_addr=10.10.2.13
is_primary=True
bmc.user=
bmc.secret=

[srvnode-2]
hostname=s-2.novalocal
network.mgmt_nw.iface=eth0
network.mgmt_nw.public_ip_addr=10.10.2.60
network.mgmt_nw.netmask=255.255.255.0
network.mgmt_nw.gateway=10.10.2.1
network.data_nw.iface=eth0:0
network.data_nw.public_ip_addr=10.10.2.57
network.data_nw.netmask=255.255.255.0
network.data_nw.gateway=10.10.2.1
network.data_nw.pvt_ip_addr=10.10.2.58
is_primary=False
bmc.user=
bmc.secret=

[srvnode-3]
hostname=s-3.novalocal
network.mgmt_nw.iface=eth0
network.mgmt_nw.public_ip_addr=10.10.2.26
network.mgmt_nw.netmask=255.255.255.0
network.mgmt_nw.gateway=10.10.2.1
network.data_nw.iface=eth0:0
network.data_nw.public_ip_addr=10.10.2.23
network.data_nw.netmask=255.255.255.0
network.data_nw.gateway=10.10.2.1
network.data_nw.pvt_ip_addr=10.10.2.24
is_primary=False
bmc.user=
bmc.secret=
mukul-seagate11 commented 3 years ago

As this issue looks from provisioner end so transferring in their repo

sachitanands commented 3 years ago

Hi @mukul-seagate11 I think you have accidently added my name instead of Sradhanand Pati hence removing my name from owner's list

sradhanand-pati commented 3 years ago

@PengMacro If possible can you share the gluster logs present in /var/log/ . BTW another thing i see is you specified 1 interface in data_nw.iface it should be 2 First one being public data iface and 2nd one being private data iface. Also can you try using long hostname/fqdn in config.ini

mukul-seagate11 commented 3 years ago

@sradhanand-pati @ypise, please take care of the issue

ypise commented 3 years ago

@PengMacro Greetings.

From the logs that you have mentioned in your comments and the config.ini file contents mentioned, I can see that the hostname values do not match:
From logs: host-10-10-2-60.openstacklocal
From config.ini: s-1.novalocal
Similarly for other nodes as well.

Could you check if the hostnames mentioned on the system (FQDNs that can be pinged over network) and the values against hostname in config.ini match? They have to be exactly the same and the FQDNs should be reachable over the management network.

PengMacro commented 3 years ago

@sradhanand-pati

I switched to another cluster and modified data_nw.iface accordingly. I think the above hostname is already the long one. (output of "hostname --fqdn") "/var/log/glusterfs/glusterd.log" file: glusterd.log

@ypise I have changed my hostname according to the name on log.

My new config.ini:

[cluster]
cluster_ip=10.10.2.131
mgmt_vip=10.10.2.132

[storage_enclosure]
type=JBOD

[srvnode-1]
hostname=host-10-10-2-215.openstacklocal
network.mgmt_nw.iface=eth0:0
network.mgmt_nw.public_ip_addr=10.10.2.212
network.mgmt_nw.netmask=255.255.255.0
network.mgmt_nw.gateway=10.10.2.1
network.data_nw.iface=eth0:1,eth0:2
network.data_nw.public_ip_addr=10.10.2.213
network.data_nw.netmask=255.255.255.0
network.data_nw.gateway=10.10.2.1
network.data_nw.pvt_ip_addr=10.10.2.214
is_primary=True
bmc.user=
bmc.secret=

[srvnode-2]
hostname=host-10-10-2-156.openstacklocal
network.mgmt_nw.iface=eth0:0
network.mgmt_nw.public_ip_addr=10.10.2.118
network.mgmt_nw.netmask=255.255.255.0
network.mgmt_nw.gateway=10.10.2.1
network.data_nw.iface=eth0:1,eth0:2
network.data_nw.public_ip_addr=10.10.2.119
network.data_nw.netmask=255.255.255.0
network.data_nw.gateway=10.10.2.1
network.data_nw.pvt_ip_addr=10.10.2.120
is_primary=False
bmc.user=
bmc.secret=

[srvnode-3]
hostname=host-10-10-2-138.openstacklocal
network.mgmt_nw.iface=eth0:0
network.mgmt_nw.public_ip_addr=10.10.2.110
network.mgmt_nw.netmask=255.255.255.0
network.mgmt_nw.gateway=10.10.2.1
network.data_nw.iface=eth0:1,eth0:2
network.data_nw.public_ip_addr=10.10.2.111
network.data_nw.netmask=255.255.255.0
network.data_nw.gateway=10.10.2.1
network.data_nw.pvt_ip_addr=10.10.2.112
is_primary=False
bmc.user=
bmc.secret=

I still got this error:

[root@host-10-10-2-215 cortx]# /usr/local/bin/provisioner setup_jbod --source iso --iso-cortx cortx-1.0.0-release-43.iso --iso-cortx-deps cortx-1.0.0-prereqs-43.iso --ha --logfile --logfile-filename ./setup.log --config-path config.ini srvnode-1:host-10-10-2-215.openstacklocal srvnode-2:host-10-10-2-121.openstacklocal srvnode-3:host-10-10-2-113.openstacklocal
2020-11-30 16:11:02,974 - INFO - Setup provisioner
2020-11-30 16:11:02,975 - INFO - The type of distribution would be set to DistrType.BUNDLE
2020-11-30 16:11:02,975 - INFO - Starting to build setup 'srvnode-1_root@host-10-10-2-215.openstacklocal_22__srvnode-2_root@host-10-10-2-121.openstacklocal_22__srvnode-3_root@host-10-10-2-113.openstacklocal_22'
2020-11-30 16:11:02,977 - INFO - Profile location '/home/cortx/.provisioner/srvnode-1_root@host-10-10-2-215.openstacklocal_22__srvnode-2_root@host-10-10-2-121.openstacklocal_22__srvnode-3_root@host-10-10-2-113.openstacklocal_22'
2020-11-30 16:11:02,977 - INFO - Generating setup keys [skipped]
2020-11-30 16:11:02,977 - INFO - Generating a roster file
2020-11-30 16:11:02,983 - INFO - Ensuring 'srvnode-1' is ready to accept commands
2020-11-30 16:11:04,849 - INFO - Ensuring 'srvnode-2' is ready to accept commands
2020-11-30 16:11:06,098 - INFO - Ensuring 'srvnode-3' is ready to accept commands
2020-11-30 16:11:07,241 - INFO - Resolving node grains
2020-11-30 16:11:09,829 - INFO - Preparing salt masters / minions configuration
2020-11-30 16:11:09,851 - INFO - Copying config.ini to file root
2020-11-30 16:11:09,855 - INFO - Preparing CORTX repos pillar
2020-11-30 16:11:09,886 - INFO - Installing Cortx yum repositories
2020-11-30 16:13:12,331 - INFO - Setting up paswordless ssh
2020-11-30 16:13:17,529 - INFO - Checking paswordless ssh
2020-11-30 16:13:22,120 - INFO - Installing SaltStack
2020-11-30 16:13:28,815 - INFO - Installing provisioner from a 'iso' source
2020-11-30 16:13:35,513 - INFO - Configuring salt minions
2020-11-30 16:13:40,730 - INFO - Configuring salt-masters
2020-11-30 16:13:56,955 - INFO - Configuring glusterfs servers
2020-11-30 16:14:04,478 - INFO - Configuring glusterfs cluster
2020-11-30 16:14:08,993 - ERROR - provisioner failed
{'srvnode-1': {'glusterfs_|-glusterfs_servers_peered_|-host-10-10-2-215.openstacklocal_|-peered': {'comment': 'Failed to peer with host-10-10-2-215.openstacklocal, please check logs for errors', 'changes': {}}, 'glusterfs_|-glusterfs_servers_peered_|-host-10-10-2-121.openstacklocal_|-peered': {'comment': 'Failed to peer with host-10-10-2-121.openstacklocal, please check logs for errors', 'changes': {}}, 'glusterfs_|-glusterfs_servers_peered_|-host-10-10-2-113.openstacklocal_|-peered': {'comment': 'Failed to peer with host-10-10-2-113.openstacklocal, please check logs for errors', 'changes': {}}, 'glusterfs_|-glusterfs_volume_volume_prvsnr_data_created_|-volume_prvsnr_data_|-volume_present': {'comment': 'One or more requisite failed: glusterfs.cluster.config.glusterfs_servers_peered', 'changes': {}}, 'glusterfs_|-glusterfs_volume_volume_salt_cache_jobs_created_|-volume_salt_cache_jobs_|-volume_present': {'comment': 'One or more requisite failed: glusterfs.cluster.config.glusterfs_servers_peered', 'changes': {}}}}

Thanks!

ghost commented 3 years ago

Hello @PengMacro In glusterd.log I see:

[2020-11-30 15:42:29.208799] I [MSGID: 106128] [glusterd-handler.c:3541:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: host-10-10-2-113.openstacklocal (24007)

Could you:

Also I wonder how your config.ini matches the command line you used to setup the cluster:

[srvnode-1] hostname=host-10-10-2-215.openstacklocal ... [srvnode-2] hostname=host-10-10-2-156.openstacklocal ... [srvnode-3] hostname=host-10-10-2-138.openstacklocal ...

against

[root@host-10-10-2-215 cortx]# /usr/local/bin/provisioner setup_jbod ... srvnode-1:host-10-10-2-215.openstacklocal srvnode-2:host-10-10-2-121.openstacklocal srvnode-3:host-10-10-2-113.openstacklocal

Could you double check that as well?

Thanks

PengMacro commented 3 years ago

I have fixed config.ini so that it matches with the command but still got the same error.

Do you mean check using "ping"? If so, host-10-10-2-215.openstacklocal and host-10-10-2-215.openstacklocalcan ping each other.

Below are the outputs:

gluster peer status Each node get: Number of Peers: 0

systemctl status glusterd -l srvnode-1

● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-11-30 15:10:50 UTC; 5 days ago
     Docs: man:glusterd(8)
  Process: 2003 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2005 (glusterd)
    Tasks: 10
   CGroup: /system.slice/glusterd.service
           └─2005 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Nov 30 15:10:50 host-10-10-2-215.openstacklocal systemd[1]: Starting GlusterFS, a clustered file-system server...
Nov 30 15:10:50 host-10-10-2-215.openstacklocal systemd[1]: Started GlusterFS, a clustered file-system server.

srvnode-2

● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-11-30 15:10:53 UTC; 5 days ago
     Docs: man:glusterd(8)
  Process: 1903 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1904 (glusterd)
    Tasks: 9
   CGroup: /system.slice/glusterd.service
           └─1904 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Nov 30 15:10:52 host-10-10-2-121.openstacklocal systemd[1]: Starting GlusterFS, a clustered file-system server...
Nov 30 15:10:53 host-10-10-2-121.openstacklocal systemd[1]: Started GlusterFS, a clustered file-system server.

srvnode-3

● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2020-11-30 15:11:21 UTC; 5 days ago
     Docs: man:glusterd(8)
  Process: 1992 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1993 (glusterd)
    Tasks: 9
   CGroup: /system.slice/glusterd.service
           └─1993 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Nov 30 15:11:20 host-10-10-2-113.openstacklocal systemd[1]: Starting GlusterFS, a clustered file-system server...
Nov 30 15:11:21 host-10-10-2-113.openstacklocal systemd[1]: Started GlusterFS, a clustered file-system server.

telnet srvnode-1

[root@host-10-10-2-215 cortx]# telnet host-10-10-2-121.openstacklocal 24007
Trying 129.114.108.156...
telnet: connect to address 129.114.108.156: No route to host
[root@host-10-10-2-215 cortx]# telnet host-10-10-2-113.openstacklocal 24007
Trying 129.114.108.138...
telnet: connect to address 129.114.108.138: No route to host

srvnode-2 and 3 also got this "no route to host" error.

stale[bot] commented 3 years ago

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @83bhp @andkononykhin2 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

justinzw commented 3 years ago

@PengMacro I just wanted to clarify. Was this an issue with the Chameleon Cloud or do you still need us to investigate?

PengMacro commented 3 years ago

Yes, I still have the problem. @ypise told me my error was due to the failure of the first ping from node1 to node3. Now the first ping is always successful (I keep pinging in the background) but I still have the same problem.

mukul-seagate11 commented 3 years ago

@andkononykhin2 @sradhanand-pati, can you address @PengMacro query to get resolved?

PengMacro commented 3 years ago

@johnbent told me I should not use provisioner to set up multi-node since provisioner is a component that is tightly coupled to Seagate hardware, and suggested to run muti-node using motr+hare. Now I am following https://github.com/Seagate/cortx-hare#quick-start. I will also try https://github.com/Seagate/cortx-motr/blob/main/scripts/provisioning/README.md#quick-start-windows.

PengMacro commented 3 years ago

I can run multi-node using hare now. Thanks for the help!