Closed Kalpesh-Chhajed closed 4 years ago
@kalpesh.chhajed As part of fix for this issue, need to update README.
Thanks for the clarification. @rajanikant.chirmade
shall we update the README file with proper steps?
@kalpesh.chhajed Two things
kmod-lustre-client
separately. Installing hare
installs all dependencies (lustre-client, Mero etc) sudo lctl network configure
)Hello @rajanikant.chirmade : Do you see any delta in version or steps i used while installation?
Two problem I found :
While uploading object to S3 server it failed and IO service crashed.
After cluster shutdown, re-bootstrap failing to elect leader.
I am able to bootstrap cluster with S3 servers on H/W provided by @kalpesh.chhajed
Installed rpms from centos-7.7 repo
[root@eosnode-1 ~]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
[root@eosnode-1 ~]# uname -a
Linux eosnode-1 3.10.0-1062.el7.x86_64 #23 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@eosnode-1 ~]#
[root@eosnode-1 ~]# yum repolist enabled | grep ci-storage.mero.colo.seagate.com
ci-storage.mero.colo.seagate.com_releases_eos_integration_centos-7.7.1908_last_successful_ 43
ci-storage.mero.colo.seagate.com_releases_eos_s3server_uploads 35
[root@eosnode-1 ~]#yum install -y --nogpgcheck hare
[root@eosnode-1 ~]#yum install -y --nogpgcheck s3server
[root@eosnode-1 ~]#/opt/seagate/hare/libexec/hare/s3auth-disable
[root@eosnode-1 ~]#modprobe lnet
[root@eosnode-1 ~]#sudo lctl network configure
[root@eosnode-1 ~]#sudo lctl list_nids
10.237.65.176@tcp
[root@eosnode-1 ~]# cat CDF.yaml
nodes:
- hostname: eosnode-1
data_iface: eno1
m0_servers:
- runs_confd: true
- io_disks: { path_glob: "/dev/sda"}
m0_clients:
s3: 2
other: 2
pools:
- name: the pool
disks: all
data_units: 1
parity_units: 0
# allowed_failures: { site: 0, rack: 0, encl: 0, ctrl: 0, disk: 0 }
[root@eosnode-1 ~]#
Bootstrap succeed with s3 server instances. But S3 upload failed (see snap of s3server errors)
[root@eosnode-1 ~]# hctl bootstrap --mkfs CDF.yaml
2020-01-09 09:02:29: Generating cluster configuration... Ok.
2020-01-09 09:02:30: Starting Consul server agent on this node.......... Ok.
2020-01-09 09:02:38: Importing configuration into the KV Store... Ok.
2020-01-09 09:02:38: Starting Consul agents on remaining cluster nodes... Ok.
2020-01-09 09:02:38: Update Consul agents configs from the KV Store... Ok.
2020-01-09 09:02:39: Install Mero configuration files... Ok.
2020-01-09 09:02:40: Waiting for the RC Leader to get elected..... Ok.
2020-01-09 09:02:42: Starting Mero (phase1, mkfs)... Ok.
2020-01-09 09:02:49: Starting Mero (phase1, m0d)... Ok.
2020-01-09 09:02:52: Starting Mero (phase2, mkfs)... Ok.
2020-01-09 09:02:57: Starting Mero (phase2, m0d)... Ok.
2020-01-09 09:03:01: Starting S3 servers (phase3)... Ok.
2020-01-09 09:03:02: Checking health of the services... Ok.
[root@eosnode-1 ~]# hctl status
Profile: 0x7000000000000001:0x22
Data Pools:
0x6f00000000000001:0x23
Services:
eosnode-1 (RC)
[started ] hax 0x7200000000000001:0x6 10.237.65.176@tcp:12345:1:1
[started ] confd 0x7200000000000001:0x9 10.237.65.176@tcp:12345:2:1
[started ] ioservice 0x7200000000000001:0xc 10.237.65.176@tcp:12345:2:2
[started ] s3server 0x7200000000000001:0x16 10.237.65.176@tcp:12345:3:1
[started ] s3server 0x7200000000000001:0x19 10.237.65.176@tcp:12345:3:2
[unknown ] m0_client 0x7200000000000001:0x1c 10.237.65.176@tcp:12345:4:1
[unknown ] m0_client 0x7200000000000001:0x1f 10.237.65.176@tcp:12345:4:2
[root@eosnode-1 ~]# ps -xa | grep m0d
30257 ? SLsl 0:01 /usr/bin/m0d -e lnet:10.237.65.176@tcp:12345:2:1 -f <0x7200000000000001:0x9> -T linux -S stobs -D db -A linuxstob:addb-stobs -m 65536 -q 16 -w 8 -c /etc/mero/confd.xc -H 10.237.65.176@tcp:12345:1:1 -U
31301 ? SLsl 0:01 /usr/bin/m0d -e lnet:10.237.65.176@tcp:12345:2:2 -f <0x7200000000000001:0xc> -T ad -S stobs -D db -A linuxstob:addb-stobs -m 65536 -q 16 -w 8 -H 10.237.65.176@tcp:12345:1:1 -U
32733 pts/0 S+ 0:00 grep --color=auto m0d
[root@eosnode-1 ~]# ps -xa | grep s3server
31957 ? SLsl 0:00 s3server --s3pidfile /var/run/s3server.0x7200000000000001:0x16.pid --clovislocal 10.237.65.176@tcp:12345:3:1 --clovisha 10.237.65.176@tcp:12345:1:1 --clovisprofilefid <0x7000000000000001:0x22> --clovisprocessfid <0x7200000000000001:0x16> --s3port 8081 --log_dir /var/log/seagate/s3/s3server-0x7200000000000001:0x16 --disable_auth=true
32125 ? SLsl 0:00 s3server --s3pidfile /var/run/s3server.0x7200000000000001:0x19.pid --clovislocal 10.237.65.176@tcp:12345:3:2 --clovisha 10.237.65.176@tcp:12345:1:1 --clovisprofilefid <0x7000000000000001:0x22> --clovisprocessfid <0x7200000000000001:0x19> --s3port 8082 --log_dir /var/log/seagate/s3/s3server-0x7200000000000001:0x19 --disable_auth=true
32831 pts/0 S+ 0:00 grep --color=auto s3server
[root@eosnode-1 ~]# s3cmd ls
[root@eosnode-1 ~]# s3cmd mb s3://seagate
Bucket 's3://seagate/' created
[root@eosnode-1 ~]# s3cmd ls
2020-01-09 14:03 s3://seagate
[root@eosnode-1 ~]# s3cmd ls s3://seagate
[root@eosnode-1 ~]# vi ~/.s3cfg
[root@eosnode-1 ~]# s3cmd put s3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm s3://seagate
upload: 's3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm' -> 's3://seagate/s3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm' [1 of 1]
7536640 of 11413320 66% in 0s 77.51 MB/s failed
7536640 of 11413320 66% in 0s 73.23 MB/s done
WARNING: Upload failed: /s3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm (500 (InternalError): We encountered an internal error. Please try again.)
WARNING: Waiting 3 sec...
upload: 's3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm' -> 's3://seagate/s3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm' [1 of 1]
7995392 of 11413320 70% in 1s 6.86 MB/s failed
7995392 of 11413320 70% in 1s 6.85 MB/s done
WARNING: Upload failed: /s3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm (500 (InternalError): We encountered an internal error. Please try again.)
WARNING: Waiting 6 sec...
upload: 's3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm' -> 's3://seagate/s3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm' [1 of 1]
8323072 of 11413320 72% in 0s 74.94 MB/s failed
8323072 of 11413320 72% in 0s 73.75 MB/s done
WARNING: Upload failed: /s3server-1.0.0-B64731_git00f328b_el7.x86_64.rpm (500 (InternalError): We encountered an internal error. Please try again.)
WARNING: Waiting 9 sec...
^CSee ya!
[root@eosnode-1 ~]# hctl status
Profile: 0x7000000000000001:0x22
Data Pools:
0x6f00000000000001:0x23
Services:
eosnode-1 (RC)
[started ] hax 0x7200000000000001:0x6 10.237.65.176@tcp:12345:1:1
[started ] confd 0x7200000000000001:0x9 10.237.65.176@tcp:12345:2:1
[started ] ioservice 0x7200000000000001:0xc 10.237.65.176@tcp:12345:2:2
[started ] s3server 0x7200000000000001:0x16 10.237.65.176@tcp:12345:3:1
[started ] s3server 0x7200000000000001:0x19 10.237.65.176@tcp:12345:3:2
[unknown ] m0_client 0x7200000000000001:0x1c 10.237.65.176@tcp:12345:4:1
[unknown ] m0_client 0x7200000000000001:0x1f 10.237.65.176@tcp:12345:4:2
[root@eosnode-1 ~]# cat /var/log/seagate/s3/s3server-0x7200000000000001\:0x16/s3server.ERROR
Log file created at: 2020/01/09 09:05:17
Running on machine: eosnode-1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0109 09:05:17.188256 31957 s3_clovis_writer.cc:451] [write_content_failed] [ReqID: 17f59308-bcd3-4de1-8d83-315df7427c15] Write to object failed after writing 0
E0109 09:05:21.308351 31957 s3_clovis_writer.cc:451] [write_content_failed] [ReqID: fcb2d449-a477-4971-8970-bc20eada3b24] Write to object failed after writing 0
E0109 09:05:27.424580 31957 s3_clovis_writer.cc:451] [write_content_failed] [ReqID: 9bcc52c3-17cf-4629-9881-70a30956cd39] Write to object failed after writing 0
[root@eosnode-1 ~]# hctl status
Profile: 0x7000000000000001:0x22
Data Pools:
0x6f00000000000001:0x23
Services:
eosnode-1 (RC)
[started ] hax 0x7200000000000001:0x6 10.237.65.176@tcp:12345:1:1
[started ] confd 0x7200000000000001:0x9 10.237.65.176@tcp:12345:2:1
[offline ] ioservice 0x7200000000000001:0xc 10.237.65.176@tcp:12345:2:2
[started ] s3server 0x7200000000000001:0x16 10.237.65.176@tcp:12345:3:1
[started ] s3server 0x7200000000000001:0x19 10.237.65.176@tcp:12345:3:2
[unknown ] m0_client 0x7200000000000001:0x1c 10.237.65.176@tcp:12345:4:1
[unknown ] m0_client 0x7200000000000001:0x1f 10.237.65.176@tcp:12345:4:2
[root@eosnode-1 ~]# hctl shutdown
Stopping s3server@0x7200000000000001:0x16 at eosnode-1... done
Stopping s3server@0x7200000000000001:0x19 at eosnode-1... done
Stopping m0d@0x7200000000000001:0x9 (confd) at eosnode-1... done
Stopping hare-hax at eosnode-1... done
Stopping hare-consul-agent at eosnode-1... done
Killing RC Leader at eosnode-1... done
[root@eosnode-1 ~]# hctl status
Cluster is not running.
[root@eosnode-1 ~]#
assigned to @rajanikant.chirmade and unassigned @kalpesh.chhajed
@vvv Hare rpm creation includes following two stages.
This job follow below steps,
@shailesh.vaidya Can you refer me to the code that built hare-0.1.0-21_gitb9c8f51_m0git973035e25.el7.x86_64
rpm?
assigned to @kalpesh.chhajed
@kalpesh.chhajed Thanks for reporting the problem!
Can you show the output of the following shell snippet please?
rpm -ql hare
for f in /etc/yum.repos.d/*; do
if grep -q lustre-local $f; then
echo "### $f"
cat $f
fi
done
(The code should be executed at eosnode-1
.)
CDF file i am using is
nodes:
- hostname: eosnode-1
data_iface: eno1
m0_servers:
- runs_confd: true
- io_disks: { path_glob: "/dev/sda"}
m0_clients:
s3: 2
other: 2
pools:
- name: the pool
disks: all
data_units: 1
parity_units: 0
changed title from Problem : Bootstrap fails w{-tih-} "Failed to start hare-hax.service: Unit not found." to Problem : Bootstrap fails w{+ith error+} "Failed to start hare-hax.service: Unit not found."
@vvv @rajanikant.chirmade
I was trying installation on HW having below config
[root@eosnode-1 ~]# uname -r
3.10.0-1062.el7.x86_64
[root@eosnode-1 ~]# cat /etc/system-release
CentOS Linux release 7.7.1908 (Core)
[root@eosnode-1 ~]# rpm -qa | grep lustre
lustre-client-dkms-2.10.4-1.el7.noarch
kmod-lustre-client-2.12.3-1.el7.x86_64
[root@eosnode-1 ~]# rpm -qa | grep mero
mero-1.4.0-11_git973035e25_3.10.0_1062.el7.x86_64
[root@eosnode-1 ~]# rpm -qa | grep hare
perl-threads-shared-1.43-6.el7.x86_64
shared-mime-info-1.8-4.el7.x86_64
hare-0.1.0-21_gitb9c8f51_m0git973035e25.el7.x86_64
[root@eosnode-1 ~]# rpm -qa | grep s3server
s3server-1.0.0-31_git0730db5_el7.x86_64
There's been no activity on this issue for 345600 seconds (that's 4 days for you, hoomans).
Let me ping some Hare maintainers on your behalf... @mssawant, @vvv: Hello there! :wave: OK, done.
I've also set needs-attention
label. Is this worth it? I don't know. But I'm keen to find out! (Oh, who am I kidding. I'm a stateless bot. All those moments will be lost in time, like tears in rain.)
Sorry for the delay. And thank you for contributing to CORTX! (Not bad for a human.)
There's been no activity on this issue for 345600 seconds (that's 4 days for you, hoomans).
Let me ping some Hare maintainers on your behalf... @mssawant, @vvv: Hello there! :wave: OK, done.
I've also set needs-attention
label. Is this worth it? I don't know. But I'm keen to find out! (Oh, who am I kidding. I'm a stateless bot. All those moments will be lost in time, like tears in rain.)
Sorry for the delay. And thank you for contributing to CORTX! (Not bad for a human.)