Seagate / cortx-motr

CORTX Motr is a distributed object and key-value storage system targeting mass capacity storage configurations. It's the core component of CORTX storage system.
https://github.com/Seagate/cortx
Apache License 2.0
59 stars 142 forks source link

Phase2 mkfs failed during cluster bootstrapping after recent updates #2071

Closed faradawn closed 2 years ago

faradawn commented 2 years ago

Problem

When bootstrapping a single-node Hare cluster, got a "phase 2 mkfs failed" error:

[cc@sky-2 cortx-hare]$ hctl bootstrap --mkfs CDF.yaml 
2022-08-11 17:29:28: Generating cluster configuration... OK
2022-08-11 17:29:30: Starting Consul server on this node......... OK
2022-08-11 17:29:36: Importing configuration into the KV store... OK
2022-08-11 17:29:36: Starting Consul on other nodes...Consul ready on all nodes
2022-08-11 17:29:37: Updating Consul configuration from the KV store... OK
2022-08-11 17:29:38: Waiting for the RC Leader to get elected........... OK
2022-08-11 17:29:46: Starting Motr (phase1, mkfs)... OK
2022-08-11 17:29:56: Starting Motr (phase1, m0d)... OK
2022-08-11 17:29:58: Starting Motr (phase2, mkfs)...Job for motr-mkfs@0x7200000000000001:0x2.service failed because the control process exited with error code. See "systemctl status motr-mkfs@0x7200000000000001:0x2.service" and "journalctl -xe" for details.

Expected behavior

Three days ago, following the same procedure, the bootstrap was successful. Wondered has there an update that influenced the deployment?

How to reproduce

Can reproduce using the procedures in this guide: https://github.com/faradawn/tutorials/blob/main/linux/cortx/motr_guide.md

Deployment information

Using the lastest commit on Aug 10, commit id: 553163b6e96cc4f7aea3f9a81d9df0824fdc4ee4 Using Skylake, CentOS 7.9.

Additional information

Disk layout was such:

sda  8:0    0 223.6G  0 disk
├─sda1   8:1    0   550M  0 part /boot/efi
├─sda2   8:2    0     8M  0 part
└─sda3   8:3    0   223G  0 part /
loop0    7:0    0   9.8G  0 loop
loop1    7:1    0   9.8G  0 loop
loop2    7:2    0   9.8G  0 loop
loop3    7:3    0   9.8G  0 loop
loop4    7:4    0   9.8G  0 loop
loop5    7:5    0   9.8G  0 loop /mnt/extra/loop-devs/loop0
loop6    7:6    0   9.8G  0 loop /mnt/extra/loop-devs/loop1
loop7    7:7    0   9.8G  0 loop /mnt/extra/loop-devs/loop2
loop8    7:8    0   9.8G  0 loop /mnt/extra/loop-devs/loop3
loop9    7:9    0   9.8G  0 loop /mnt/extra/loop-devs/loop4

Also tied the followings:

  1. mount or not mount the loop devices
  2. format or not formate the loop files to ext4 prior to deployment
  3. use or not use the debugging flag during ./configure --with-trace-max-level=M0_DEBUG

Would appreciate any help! Thanks!

cortx-admin commented 2 years ago

For the convenience of the Seagate development team, this issue has been mirrored in a private Seagate Jira Server: https://jts.seagate.com/browse/CORTX-33923. Note that community members will not be able to access that Jira server but that is not a problem since all activity in that Jira mirror will be copied into this GitHub issue.

hessio commented 2 years ago

You can see this conversation in Slack where there was a response to this issue: https://cortxcommunity.slack.com/archives/C019S0SGWNQ/p1660238534535739

cortx-admin commented 2 years ago

Chandradhar Raval commented in Jira Server:

This is duplicate of CORTX-33876

cortx-admin commented 2 years ago

Vaibhav Prakash Paratwar commented in Jira Server:

Duplicate as per previous comment

cortx-admin commented 2 years ago

Vaibhav Prakash Paratwar commented in Jira Server:

Addressed by CORTX-33876

cortx-admin commented 2 years ago

Vaibhav Prakash Paratwar commented in Jira Server:

.

cortx-admin commented 2 years ago

Vaibhav Prakash Paratwar commented in Jira Server:

[~530903] Please reassign appropriately for verification as the duplicate issue has been addressed in hare repo.

cortx-admin commented 2 years ago

Vaibhav Prakash Paratwar commented in Jira Server:

.