Seagate / cortx-hare

CORTX Hare configures Motr object store, starts/stops Motr services, and notifies Motr of service and device faults.
https://github.com/Seagate/cortx
Apache License 2.0
13 stars 80 forks source link

"Failed to start hare-hax.service: Unit not found" error while bootstrap #1211

Closed imvenkip closed 4 years ago

imvenkip commented 4 years ago

Got below error while trying single node cluster setup with Motr and Hare latest branches (DevVM) (Ref: README_developers.md)

-bash-4.2$ sudo hctl bootstrap --mkfs singlenode.yaml
2020-08-13 00:22:11: Generating cluster configuration... OK
2020-08-13 00:22:11: Starting Consul server agent on this node......... OK
2020-08-13 00:22:18: Importing configuration into the KV store... OK
2020-08-13 00:22:18: Starting Consul agents on other cluster nodes... OK
2020-08-13 00:22:18: Updating Consul agents configs from the KV store...Error! No key exists at: m0conf/nodes/ssc-vm-0899.colo.seagate.com/processes/12/meta_data
 OK
2020-08-13 00:22:19: Installing Motr configuration files... OK
2020-08-13 00:22:19: Waiting for the RC Leader to get elected.......... OK
2020-08-13 00:22:26: Starting Motr (phase1, mkfs)...Failed to start hare-hax.service: Unit not found.
-bash-4.2$

-bash-4.2$ hctl status
Cluster is not running

Note: created softlink for m0confgen.

-bash-4.2$ sudo ln -s /home/744290/motr-code/cortx-motr/utils/m0confgen /usr/bin/m0confgen
vvv commented 4 years ago

@imvenkip What is the “DevVM”? Is this an SSC VM?

2020-08-13 00:22:26: Starting Motr (phase1, mkfs)...Failed to start hare-hax.service: Unit not found.

This doesn't look like Hare was installed properly.

Did you install Hare from sources or from RPM?

imvenkip commented 4 years ago

Did you install Hare from sources or from RPM?

Sources. "dev" branch. Last commit ID: fdaaf3cf57385eaf34179ed3858098900d1cecd5

What is the “DevVM”? Is this an SSC VM?

SSC VM (LDR 2)

vvv commented 4 years ago

@imvenkip Can you show the installation procedure somehow?

imvenkip commented 4 years ago

@imvenkip Can you show the installation procedure somehow?

  • Make an Asciinema recording perhaps?
  • Or tell the hostname of that VM so that Hare developer can SSH there and have a look?
  • Or perhaps we could have a shared screen call some time this week?

hostname: ssc-vm-0899.colo.seagate.com

Source Code Details:

-bash-4.2$ pwd /home/744290/motr-code

-bash-4.2$ ls cortx-hare cortx-motr m0trace.28584 m0trace.28641 singlenode.yaml -bash-4.2$

vvv commented 4 years ago
Output of hctl bootstrap --debug --mkfs cfgen/examples/singlenode.yaml command ``` sc-vm-0899:cortx-hare (dev *)# hctl bootstrap --debug --mkfs cfgen/examples/singlenode.yaml + [hare-bootstrap:204] cdf=cfgen/examples/singlenode.yaml + [hare-bootstrap:206] sudo systemctl --quiet is-active hare-consul-agent + [hare-bootstrap:210] [[ -z '' ]] + [hare-bootstrap:211] conf_dir=/var/lib/hare + [hare-bootstrap:213] [[ -d /var/lib/hare ]] + [hare-bootstrap:220] [[ -w /var/lib/hare ]] + [hare-bootstrap:232] say 'Generating cluster configuration...' ++ [hare-bootstrap:53:say] date '+%F %T' + [hare-bootstrap:53:say] echo -n '2020-08-13 04:57:58: Generating cluster configuration...' 2020-08-13 04:57:58: Generating cluster configuration...+++ [hare-bootstrap:233] readlink -f /opt/seagate/cortx/hare/libexec/hare-bootstrap ++ [hare-bootstrap:233] dirname /home/744290/motr-code/cortx-hare/utils/hare-bootstrap + [hare-bootstrap:233] PATH=/home/744290/motr-code/cortx-hare/utils:/opt/seagate/cortx/hare/bin:/opt/seagate/cortx/hare/libexec:/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/puppetlabs/bin:/root/bin + [hare-bootstrap:233] cfgen -o /var/lib/hare cfgen/examples/singlenode.yaml + [hare-bootstrap:234] dhall text + [hare-bootstrap:234] m0confgen + [hare-bootstrap:236] read node _ ++ [hare-bootstrap:247] get_all_nodes ++ [hare-bootstrap:67:get_all_nodes] jq -r '(.servers + .clients)[] | "\(.node_name) \(.ipaddr)"' /var/lib/hare/consul-agents.json + [hare-bootstrap:237] is_localhost localhost + [hare-bootstrap:161:is_localhost] (( 1 == 1 )) + [hare-bootstrap:161:is_localhost] [[ -n localhost ]] + [hare-bootstrap:162:is_localhost] local node=localhost + [hare-bootstrap:163:is_localhost] case $node in + [hare-bootstrap:164:is_localhost] return 0 + [hare-bootstrap:238] echo localhost + [hare-bootstrap:236] read node _ + [hare-bootstrap:246] echo ' OK' OK + [hare-bootstrap:249] abort_if_RC_leader_election_is_impossible + [hare-bootstrap:109:abort_if_RC_leader_election_is_impossible] local ssh node confd_id svc cmd + [hare-bootstrap:110:abort_if_RC_leader_election_is_impossible] status_commands=() + [hare-bootstrap:110:abort_if_RC_leader_election_is_impossible] local status_commands + [hare-bootstrap:114:abort_if_RC_leader_election_is_impossible] IFS=/ + [hare-bootstrap:114:abort_if_RC_leader_election_is_impossible] read node confd_id ++ [hare-bootstrap:108:abort_if_RC_leader_election_is_impossible] jq -r '.[] | .key' /var/lib/hare/consul-kv.json ++ [hare-bootstrap:109:abort_if_RC_leader_election_is_impossible] grep '/services/confd$' ++ [hare-bootstrap:111:abort_if_RC_leader_election_is_impossible] cut -d/ -f3,5 + [hare-bootstrap:115:abort_if_RC_leader_election_is_impossible] local ok=true ++ [hare-bootstrap:116:abort_if_RC_leader_election_is_impossible] printf 0x7200000000000001:0x%x 9 + [hare-bootstrap:116:abort_if_RC_leader_election_is_impossible] confd_id=m0d@0x7200000000000001:0x9 ++ [hare-bootstrap:117:abort_if_RC_leader_election_is_impossible] node-name + [hare-bootstrap:117:abort_if_RC_leader_election_is_impossible] [[ localhost == localhost ]] + [hare-bootstrap:118:abort_if_RC_leader_election_is_impossible] ssh= + [hare-bootstrap:123:abort_if_RC_leader_election_is_impossible] for svc in hare-hax '$confd_id' + [hare-bootstrap:124:abort_if_RC_leader_election_is_impossible] sudo systemctl --quiet --state=failed is-failed hare-hax + [hare-bootstrap:123:abort_if_RC_leader_election_is_impossible] for svc in hare-hax '$confd_id' + [hare-bootstrap:124:abort_if_RC_leader_election_is_impossible] sudo systemctl --quiet --state=failed is-failed m0d@0x7200000000000001:0x9 + [hare-bootstrap:131:abort_if_RC_leader_election_is_impossible] true + [hare-bootstrap:134:abort_if_RC_leader_election_is_impossible] return 0 + [hare-bootstrap:252] read _ join_ip ++ [hare-bootstrap:252] get_server_nodes ++ [hare-bootstrap:57:get_server_nodes] jq -r '.servers[] | "\(.node_name) \(.ipaddr)"' /var/lib/hare/consul-agents.json +++ [hare-bootstrap:252] node-name ++ [hare-bootstrap:252] grep -w localhost + [hare-bootstrap:254] [[ -z 192.168.47.27 ]] + [hare-bootstrap:264] say 'Starting Consul server agent on this node...' ++ [hare-bootstrap:53:say] date '+%F %T' + [hare-bootstrap:53:say] echo -n '2020-08-13 04:57:59: Starting Consul server agent on this node...' 2020-08-13 04:57:59: Starting Consul server agent on this node...+ [hare-bootstrap:266] mk-consul-env --mode server --bind 192.168.47.27 --extra-options '-ui -bootstrap-expect 1' + [hare-bootstrap:269] sudo systemctl start hare-consul-agent + [hare-bootstrap:273] consul info + [hare-bootstrap:273] grep -q 'leader.*true' + [hare-bootstrap:274] sleep 1 + [hare-bootstrap:275] echo -n . .+ [hare-bootstrap:273] consul info + [hare-bootstrap:273] grep -q 'leader.*true' + [hare-bootstrap:274] sleep 1 + [hare-bootstrap:275] echo -n . .+ [hare-bootstrap:273] consul info + [hare-bootstrap:273] grep -q 'leader.*true' + [hare-bootstrap:274] sleep 1 + [hare-bootstrap:275] echo -n . .+ [hare-bootstrap:273] consul info + [hare-bootstrap:273] grep -q 'leader.*true' + [hare-bootstrap:274] sleep 1 + [hare-bootstrap:275] echo -n . .+ [hare-bootstrap:273] consul info + [hare-bootstrap:273] grep -q 'leader.*true' + [hare-bootstrap:274] sleep 1 + [hare-bootstrap:275] echo -n . .+ [hare-bootstrap:273] consul info + [hare-bootstrap:273] grep -q 'leader.*true' + [hare-bootstrap:274] sleep 1 + [hare-bootstrap:275] echo -n . .+ [hare-bootstrap:273] consul info + [hare-bootstrap:273] grep -q 'leader.*true' + [hare-bootstrap:277] echo ' OK' OK + [hare-bootstrap:279] say 'Importing configuration into the KV store...' ++ [hare-bootstrap:53:say] date '+%F %T' + [hare-bootstrap:53:say] echo -n '2020-08-13 04:58:06: Importing configuration into the KV store...' 2020-08-13 04:58:06: Importing configuration into the KV store...+ [hare-bootstrap:280] jq '[.[] | {key, value: (.value | @base64)}]' + [hare-bootstrap:281] consul kv import - + [hare-bootstrap:282] echo ' OK' OK + [hare-bootstrap:284] say 'Starting Consul agents on other cluster nodes...' ++ [hare-bootstrap:53:say] date '+%F %T' + [hare-bootstrap:53:say] echo -n '2020-08-13 04:58:06: Starting Consul agents on other cluster nodes...' 2020-08-13 04:58:06: Starting Consul agents on other cluster nodes...+ [hare-bootstrap:285] pids=() + [hare-bootstrap:286] read node bind_ip ++ [hare-bootstrap:291] get_server_nodes ++ [hare-bootstrap:57:get_server_nodes] jq -r '.servers[] | "\(.node_name) \(.ipaddr)"' /var/lib/hare/consul-agents.json +++ [hare-bootstrap:291] node-name ++ [hare-bootstrap:291] grep -vw localhost ++ [hare-bootstrap:291] true + [hare-bootstrap:293] read node bind_ip ++ [hare-bootstrap:298] get_client_nodes ++ [hare-bootstrap:62:get_client_nodes] jq -r '.clients[] | "\(.node_name) \(.ipaddr)"' /var/lib/hare/consul-agents.json + [hare-bootstrap:299] wait4 + [hare-bootstrap:300] agents_nr=1 + [hare-bootstrap:303] count=1 ++ [hare-bootstrap:304] get_ready_agents_nr ++ [hare-bootstrap:105:get_ready_agents_nr] consul members ++ [hare-bootstrap:105:get_ready_agents_nr] sed 1d ++ [hare-bootstrap:105:get_ready_agents_nr] wc -l + [hare-bootstrap:304] (( 1 != 1 )) + [hare-bootstrap:316] echo ' OK' OK + [hare-bootstrap:318] say 'Updating Consul agents configs from the KV store...' ++ [hare-bootstrap:53:say] date '+%F %T' + [hare-bootstrap:53:say] echo -n '2020-08-13 04:58:06: Updating Consul agents configs from the KV store...' 2020-08-13 04:58:06: Updating Consul agents configs from the KV store...+ [hare-bootstrap:320] pids=($!) + [hare-bootstrap:319] update-consul-conf + [hare-bootstrap:321] read node _ ++ [hare-bootstrap:324] get_all_nodes ++ [hare-bootstrap:67:get_all_nodes] jq -r '(.servers + .clients)[] | "\(.node_name) \(.ipaddr)"' /var/lib/hare/consul-agents.json +++ [hare-bootstrap:324] node-name ++ [hare-bootstrap:324] grep -vw localhost ++ [hare-bootstrap:324] true + [hare-bootstrap:325] wait4 1861 + [hare-bootstrap:95:wait4] for pid in '$*' + [hare-bootstrap:96:wait4] wait 1861 Error! No key exists at: m0conf/nodes/localhost/processes/12/meta_data + [hare-bootstrap:326] echo ' OK' OK + [hare-bootstrap:328] say 'Installing Motr configuration files...' ++ [hare-bootstrap:53:say] date '+%F %T' + [hare-bootstrap:53:say] echo -n '2020-08-13 04:58:06: Installing Motr configuration files...' 2020-08-13 04:58:06: Installing Motr configuration files...+ [hare-bootstrap:329] read node _ ++ [hare-bootstrap:331] get_server_nodes ++ [hare-bootstrap:57:get_server_nodes] jq -r '.servers[] | "\(.node_name) \(.ipaddr)"' /var/lib/hare/consul-agents.json +++ [hare-bootstrap:331] node-name ++ [hare-bootstrap:331] grep -vw localhost ++ [hare-bootstrap:331] true + [hare-bootstrap:332] echo ' OK' OK + [hare-bootstrap:334] say 'Waiting for the RC Leader to get elected...' ++ [hare-bootstrap:53:say] date '+%F %T' + [hare-bootstrap:53:say] echo -n '2020-08-13 04:58:06: Waiting for the RC Leader to get elected...' 2020-08-13 04:58:06: Waiting for the RC Leader to get elected...+ [hare-bootstrap:335] wait_rc_leader + [hare-bootstrap:82:wait_rc_leader] local count=1 ++ [hare-bootstrap:83:wait_rc_leader] get_session ++ [hare-bootstrap:72:get_session] consul kv get -detailed leader ++ [hare-bootstrap:72:get_session] awk '/Session/ {print $2}' + [hare-bootstrap:83:wait_rc_leader] [[ - == \- ]] + [hare-bootstrap:84:wait_rc_leader] (( 1 > 5 )) + [hare-bootstrap:88:wait_rc_leader] sleep 1 + [hare-bootstrap:89:wait_rc_leader] echo -n . .+ [hare-bootstrap:90:wait_rc_leader] (( count++ )) ++ [hare-bootstrap:83:wait_rc_leader] get_session ++ [hare-bootstrap:72:get_session] consul kv get -detailed leader ++ [hare-bootstrap:72:get_session] awk '/Session/ {print $2}' + [hare-bootstrap:83:wait_rc_leader] [[ - == \- ]] + [hare-bootstrap:84:wait_rc_leader] (( 2 > 5 )) + [hare-bootstrap:88:wait_rc_leader] sleep 1 + [hare-bootstrap:89:wait_rc_leader] echo -n . .+ [hare-bootstrap:90:wait_rc_leader] (( count++ )) ++ [hare-bootstrap:83:wait_rc_leader] get_session ++ [hare-bootstrap:72:get_session] consul kv get -detailed leader ++ [hare-bootstrap:72:get_session] awk '/Session/ {print $2}' + [hare-bootstrap:83:wait_rc_leader] [[ - == \- ]] + [hare-bootstrap:84:wait_rc_leader] (( 3 > 5 )) + [hare-bootstrap:88:wait_rc_leader] sleep 1 + [hare-bootstrap:89:wait_rc_leader] echo -n . .+ [hare-bootstrap:90:wait_rc_leader] (( count++ )) ++ [hare-bootstrap:83:wait_rc_leader] get_session ++ [hare-bootstrap:72:get_session] consul kv get -detailed leader ++ [hare-bootstrap:72:get_session] awk '/Session/ {print $2}' + [hare-bootstrap:83:wait_rc_leader] [[ e201feb7-c831-fe82-7469-ee331a0a3ca8 == \- ]] ++ [hare-bootstrap:336] get_session ++ [hare-bootstrap:72:get_session] consul kv get -detailed leader ++ [hare-bootstrap:72:get_session] awk '/Session/ {print $2}' + [hare-bootstrap:336] sid=e201feb7-c831-fe82-7469-ee331a0a3ca8 ++ [hare-bootstrap:341] get_session_checks_nr e201feb7-c831-fe82-7469-ee331a0a3ca8 ++ [hare-bootstrap:76:get_session_checks_nr] local sid=e201feb7-c831-fe82-7469-ee331a0a3ca8 ++ [hare-bootstrap:77:get_session_checks_nr] curl -sX GET http://localhost:8500/v1/session/info/e201feb7-c831-fe82-7469-ee331a0a3ca8 ++ [hare-bootstrap:78:get_session_checks_nr] jq -r '.[].Checks|length' + [hare-bootstrap:341] (( 3 == 1 )) + [hare-bootstrap:346] echo ' OK' OK + [hare-bootstrap:387] bootstrap_nodes phase1 + [hare-bootstrap:378:bootstrap_nodes] local phase=phase1 + [hare-bootstrap:380:bootstrap_nodes] [[ -n --mkfs ]] + [hare-bootstrap:381:bootstrap_nodes] start_motr mkfs phase1 + [hare-bootstrap:360:start_motr] local op=mkfs + [hare-bootstrap:361:start_motr] local phase=phase1 + [hare-bootstrap:363:start_motr] say 'Starting Motr (phase1, mkfs)...' ++ [hare-bootstrap:53:say] date '+%F %T' + [hare-bootstrap:53:say] echo -n '2020-08-13 04:58:10: Starting Motr (phase1, mkfs)...' 2020-08-13 04:58:10: Starting Motr (phase1, mkfs)...+ [hare-bootstrap:364:start_motr] [[ mkfs == \m\k\f\s ]] + [hare-bootstrap:364:start_motr] op=--mkfs-only + [hare-bootstrap:366:start_motr] pids=($!) + [hare-bootstrap:365:start_motr] bootstrap-node --mkfs-only --phase phase1 + [hare-bootstrap:368:start_motr] read node _ ++ [hare-bootstrap:359:start_motr] get_nodes phase1 ++ [hare-bootstrap:349:get_nodes] local phase=phase1 ++ [hare-bootstrap:351:get_nodes] [[ phase1 == phase1 ]] ++ [hare-bootstrap:353:get_nodes] get_server_nodes ++ [hare-bootstrap:57:get_server_nodes] jq -r '.servers[] | "\(.node_name) \(.ipaddr)"' /var/lib/hare/consul-agents.json +++ [hare-bootstrap:359:start_motr] node-name ++ [hare-bootstrap:359:start_motr] grep -vw localhost ++ [hare-bootstrap:359:start_motr] true + [hare-bootstrap:372:start_motr] wait4 2480 + [hare-bootstrap:95:wait4] for pid in '$*' + [hare-bootstrap:96:wait4] wait 2480 Failed to start hare-hax.service: Unit not found. 5💥 ssc-vm-0899:cortx-hare (dev *)# : ssc-vm-0899:cortx-hare (dev *)# ```
vvv commented 4 years ago

600 had similar symptoms. @rajanikantchirmade diagnosed the problem then as LNet misconfiguration.

I'm not sure this is the case this time though.

ssc-vm-0899:cortx-hare (dev *)# lctl list_nids
192.168.47.27@tcp
mssawant commented 4 years ago

looks like motr state is not right,

[root@ssc-vm-0899 motr-code]# systemctl status motr-kernel
Unit motr-kernel.service could not be found.
lrwxrwxrwx 1 root root 86 Aug 12 23:48 /usr/lib/systemd/system/motr-kernel.service -> /home/744290/motr-code/motr/scripts/install/usr/lib/systemd/system/motr-kernel.service
[root@ssc-vm-0899 motr-code]# ls -l /home/744290/motr-code/motr/scripts/install/usr/lib/systemd/system/motr-kernel.service
ls: cannot access /home/744290/motr-code/motr/scripts/install/usr/lib/systemd/system/motr-kernel.service: No such file or directory

this is the right path,

/home/744290/motr-code/cortx-motr/scripts/install/usr/lib/systemd/system/motr-kernel.service

Some problem in motr installation.

vvv commented 4 years ago

I rebuilt and reinstallated cortx-motr

cd /home/744290/motr-code/cortx-motr
scripts/install-motr-service --uninstall
git clean -fdx
scripts/m0 make
scripts/install-motr-service --link

and then cortx-hare

cd /home/744290/motr-code/cortx-hare
make uninstall
make distclean
make devinstall

Cluster started successfully:

# time hctl bootstrap --mkfs cfgen/examples/singlenode.yaml 
2020-08-13 16:07:46: Generating cluster configuration... OK
2020-08-13 16:07:47: Starting Consul server agent on this node............ OK
2020-08-13 16:07:56: Importing configuration into the KV store... OK
2020-08-13 16:07:56: Starting Consul agents on other cluster nodes... OK
2020-08-13 16:07:56: Updating Consul agents configs from the KV store...Error! No key exists at: m0conf/nodes/localhost/processes/12/meta_data
 OK
2020-08-13 16:07:57: Installing Motr configuration files... OK
2020-08-13 16:07:57: Waiting for the RC Leader to get elected....... OK
2020-08-13 16:08:01: Starting Motr (phase1, mkfs)... OK
2020-08-13 16:08:08: Starting Motr (phase1, m0d)... OK
2020-08-13 16:08:09: Starting Motr (phase2, mkfs)... OK
2020-08-13 16:08:18: Starting Motr (phase2, m0d)... OK
2020-08-13 16:08:20: Checking health of services... OK

real    0m34.899s
user    0m4.939s
sys     0m1.515s
ssc-vm-0899:cortx-hare (dev *)# hctl status
Profile: 0x7000000000000001:0x28
Data pools:
    0x6f00000000000001:0x29
Services:
    localhost  (RC)
    [started]  hax        0x7200000000000001:0x6   192.168.47.27@tcp:12345:1:1
    [started]  confd      0x7200000000000001:0x9   192.168.47.27@tcp:12345:2:1
    [started]  ioservice  0x7200000000000001:0xc   192.168.47.27@tcp:12345:2:2
    [unknown]  m0_client  0x7200000000000001:0x22  192.168.47.27@tcp:12345:4:1
    [unknown]  m0_client  0x7200000000000001:0x25  192.168.47.27@tcp:12345:4:2