Seagate / cortx-hare

CORTX Hare configures Motr object store, starts/stops Motr services, and notifies Motr of service and device faults.
https://github.com/Seagate/cortx
Apache License 2.0
13 stars 80 forks source link

CORTX-32483: all2all dtm test fails #2126

Closed mssawant closed 2 years ago

mssawant commented 2 years ago

On process restart, before replying to the first entrypoint request, Hare notifies the process as M0_NC_FAILED to rest of the motr cluster. But this restricts dtm recovery completion on process restart as the process is marked as FAILED and not OFFLINE.

Solution: Notify OFFLINE instead of FAILED on process restart.

Signed-off-by: Mandar Sawant mandar.sawant@seagate.com

vaibhavparatwar commented 2 years ago

@mssawant what is the JIRA we can link this PR to?

pavankrishnat commented 2 years ago

Mandar​, with about Changes Happy Path IOs are failing on both 5N (sanityJob) and 15N (manually) VMs Build: custom-ci #6904 Deployment: setup-cortx-rgw-cluster #7759 Sanity: Hare_k8s_Sanity_PR #164

Sanity error: Hare_Sanity_164_consoleText.txt Failed tests:

 FAILED tests/cft/test_io_workload.py::TestIOWorkload::test_basic_io - commons...
 FAILED tests/csm/rest/test_iam_users.py::TestIamUserRGW::test_37016 - commons...
 FAILED tests/s3/test_data_path_validation.py::TestDataPathValidation::test_1701[1000-1M]
 FAILED tests/s3/test_object_workflow_operations.py::TestObjectWorkflowOperations::test_delete_object_2220

Error info:

=================================== FAILURES ===================================
_________________________ TestIOWorkload.test_basic_io _________________________
        LOGGER.info("Putting object")
        try:
            response = super().put_object(bucket_name, object_name, file_path, **kwargs)
        except (ClientError, Exception) as error:
            LOGGER.error("Error in %s: %s",
                         S3TestLib.put_object.__name__,
                         error)
>           raise CTException(err.S3_CLIENT_ERROR, error.args[0]) from error
E           commons.exceptions.CTException: CTException: EC(4007)
E           Error Desc: S3 Client Error
E           Error Message: An error occurred (UnknownError) when calling the PutObject operation (reached max retries: 6): Unknown
E           Other info:
E           {}

On 15N:

Write file...
+ aws s3 ls s3://test/ --endpoint-url http://10.110.208.81:80
+ aws s3 cp sanityIO1gb s3://test/sanityIO1gb --endpoint-url http://10.110.208.81:80
upload failed: ./sanityIO1gb to s3://test/sanityIO1gb An error occurred (InvalidArgument) when calling the UploadPart operation: Unknown

dg test failed: https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/185/console (test_degraded_iteration_write_read_partial_delete DegradedPath-Type4)

[2022-06-24 04:23:41] [MainThread] [DEBUG ] [system_utils.py: 165]: Command: s3bench -accessKey=IOF6TNDN8C2TOETQYIOC -accessSecret=Zb2dlZb51Ru84WJU15tlGXr7Xx2RW2WxnRel9uj9 -bucket=test-40174-bkt-0-1656066194.826625 -endpoint=https://192.168.62.236:30443/ -numClients=10 -numSamples=315 -objectNamePrefix=obj_134217728 -objectSize=134217728b -skipSSLCertVerification=True -s3MaxRetries=5 -httpClientTimeout=500000 -region us-east-1 -skipCleanup -validate >> log/latest/write_workload_134217728b_s3bench_10_315_134217728b_24-06-2022-04-23-41-034681.log 2>&1
[2022-06-24 04:25:53] [MainThread] [DEBUG ] [system_utils.py: 170]: output = b''
[2022-06-24 04:25:53] [MainThread] [DEBUG ] [system_utils.py: 171]: error = b''
[2022-06-24 04:25:53] [MainThread] [DEBUG ] [s3bench.py: 251]: Response: (True, "b''")
[2022-06-24 04:25:53] [MainThread] [INFO  ] [s3bench.py: 253]: Workload execution completed.
[2022-06-24 04:25:53] [MainThread] [DEBUG ] [s3bench.py: 97]: list response ["b''"]
[2022-06-24 04:25:53] [MainThread] [INFO  ] [near_full_data_storage.py: 134]: Workload: 315 objects of 134217728 with 10 parallel clients 
[2022-06-24 04:25:53] [MainThread] [INFO  ] [near_full_data_storage.py: 135]: Log Path log/latest/write_workload_134217728b_s3bench_10_315_134217728b_24-06-2022-04-23-41-034681.log
[2022-06-24 04:25:53] [MainThread] [INFO  ] [s3bench.py: 139]: Debug: Log File Path log/latest/write_workload_134217728b_s3bench_10_315_134217728b_24-06-2022-04-23-41-034681.log
[2022-06-24 04:25:53] [MainThread] [INFO  ] [s3bench.py: 150]: 'Error count' filtered list: ['    Errors Count:               258\n', '    Errors Count:               258\n', '    Errors Count:               258\n']
pavankrishnat commented 2 years ago

After freshly re-deployed (1+14N cluster), Happy Path IOs Passed. testing degraded IOs. Test IOs in dg mode Job: https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/193/console (Failed in Happy Path).

Manually testing IOs, 4 times successfully read/write 1Gb files in dg mode kubectl scale deploy cortx-data-ssc-vm-g3-rhev4-2278 --replicas 0.

mssawant commented 2 years ago

1N deployment fails due remove bucket failure,

16:26:21  --------------- Remove 'test-bucket' bucket ---------------
16:26:21  
16:26:22  remove_bucket failed: s3://test-bucket An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: Unknown
mssawant commented 2 years ago

Hare sanity fails with the same error,

16:30:21  --------------- Remove 'test-bucket' bucket ---------------
16:30:21  
16:30:21  remove_bucket failed: s3://test-bucket An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: Unknown
[Pipeline] }
mssawant commented 2 years ago

Tested 15N deployment without dtm enabled, tested ioservice restart with continuous IO, degraded IO worked. Tested 15N deployment with dtm enabled, deployment completed for data and server pods. Seen data and server pods restarts, eventually cluster stabalized.

supriyachavan4398 commented 2 years ago

Testing:

Custom build: https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/6938/ Deployment successful at https://eos-jenkins.colo.seagate.com/job/Cortx-Automation/job/RGW/job/setup-cortx-rgw-cluster/7924/ Error info: For data pods motr ioservice containers restart so many times:

[root@ssc-vm-g2-rhev4-1630 ~]# kubectl logs cortx-data-ssc-vm-g2-rhev4-1632-5f9884c6d6-zd98q -c cortx-motr-io-001
2022-06-27 10:48:01,967 - executing command /usr/libexec/cortx-motr/motr-start m0d-0x7200000000000001:0x19
2022-06-27 10:48:02,017 - MOTR_M0D_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@21001
2022-06-27 10:48:02,019 - MOTR_PROCESS_FID: 0x7200000000000001:0x19
2022-06-27 10:48:02,019 - MOTR_HA_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@22001
2022-06-27 10:48:02,019 - MOTR_M0D_DATA_DIR: /etc/cortx/motr
2022-06-27 10:48:02,041 - motr transport : libfab
2022-06-27 10:48:02,063 - Service FID: m0d-0x7200000000000001:0x19
2022-06-27 10:48:02,080 - BE log size is not configured
2022-06-27 10:48:02,080 - + exec /usr/bin/m0d -e libfab:inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@21001 -A linuxstob:/etc/cortx/log/motr/4d00d87bf8a357c6471d06636ef677c5/addb/m0d-0x7200000000000001:0x19/addb-stobs -f '<0x7200000000000001:0x19>' -T ad -S stobs -D db -m 524288 -q 64 -E 32 -J 64 -H inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@22001 -U -B /dev/sdc -z 26843545600 -r 134217728
2022-06-27 10:48:10,281 - motr[00054]:  b060  ERROR  [reqh/reqh.c:454:m0_reqh_fop_allow]  <! rc=-111
2022-06-27 10:48:10,282 - motr[00054]:  b0e0   WARN  [reqh/reqh.c:561:m0_reqh_fop_handle]  fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:10,282 - motr[00054]:  b0e0  ERROR  [reqh/reqh.c:568:m0_reqh_fop_handle]  <! rc=-108 Service shutdown.
2022-06-27 10:48:11,281 - motr[00054]:  b060  ERROR  [reqh/reqh.c:454:m0_reqh_fop_allow]  <! rc=-111
2022-06-27 10:48:11,282 - motr[00054]:  b0e0   WARN  [reqh/reqh.c:561:m0_reqh_fop_handle]  fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:11,282 - motr[00054]:  b0e0  ERROR  [reqh/reqh.c:568:m0_reqh_fop_handle]  <! rc=-108 Service shutdown.
2022-06-27 10:48:12,282 - motr[00054]:  b060  ERROR  [reqh/reqh.c:454:m0_reqh_fop_allow]  <! rc=-111
2022-06-27 10:48:12,282 - motr[00054]:  b0e0   WARN  [reqh/reqh.c:561:m0_reqh_fop_handle]  fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:12,282 - motr[00054]:  b0e0  ERROR  [reqh/reqh.c:568:m0_reqh_fop_handle]  <! rc=-108 Service shutdown.
2022-06-27 10:48:13,282 - motr[00054]:  b060  ERROR  [reqh/reqh.c:454:m0_reqh_fop_allow]  <! rc=-111
2022-06-27 10:48:13,282 - motr[00054]:  b0e0   WARN  [reqh/reqh.c:561:m0_reqh_fop_handle]  fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:13,282 - motr[00054]:  b0e0  ERROR  [reqh/reqh.c:568:m0_reqh_fop_handle]  <! rc=-108 Service shutdown.
2022-06-27 10:48:14,282 - motr[00054]:  b060  ERROR  [reqh/reqh.c:454:m0_reqh_fop_allow]  <! rc=-111
2022-06-27 10:48:14,283 - motr[00054]:  b0e0   WARN  [reqh/reqh.c:561:m0_reqh_fop_handle]  fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:14,283 - motr[00054]:  b0e0  ERROR  [reqh/reqh.c:568:m0_reqh_fop_handle]  <! rc=-108 Service shutdown.
2022-06-27 10:48:15,288 - motr[00054]:  b060  ERROR  [reqh/reqh.c:454:m0_reqh_fop_allow]  <! rc=-111
2022-06-27 10:48:15,289 - motr[00054]:  b0e0   WARN  [reqh/reqh.c:561:m0_reqh_fop_handle]  fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:15,289 - motr[00054]:  b0e0  ERROR  [reqh/reqh.c:568:m0_reqh_fop_handle]  <! rc=-108 Service shutdown.
2022-06-27 10:48:16,283 - motr[00054]:  b060  ERROR  [reqh/reqh.c:454:m0_reqh_fop_allow]  <! rc=-111
2022-06-27 10:48:16,283 - motr[00054]:  b0e0   WARN  [reqh/reqh.c:561:m0_reqh_fop_handle]  fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:16,283 - motr[00054]:  b0e0  ERROR  [reqh/reqh.c:568:m0_reqh_fop_handle]  <! rc=-108 Service shutdown.
2022-06-27 10:48:16,887 - motr[00054]:  b060  ERROR  [reqh/reqh.c:454:m0_reqh_fop_allow]  <! rc=-111
p 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:31:55,237 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2632:435681331] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:32:08,969 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2646:168313715] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:32:57,518 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2694:716999746] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:33:46,788 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2743:986793723] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:34:39,075 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2796:273196139] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:02,138 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2819:337245942] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:08,552 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2825:750653292] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:19,937 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2837:136528432] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:26,134 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2843:333003119] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:46,423 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2863:620947470] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:57,450 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2874:648941618] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:36:09,491 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2886:689949617] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:36:51,014 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2928:213002793] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:37:13,985 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2951:184347233] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:37:22,946 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[2960:144947794] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:38:05,007 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[3002:205868984] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:39:26,802 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[3084:000385896] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:39:41,640 - motr[00054]:  79f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[3098:839329248] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT

For server pods, cortx-rgw container restarts many times:

[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:469149801] seconds in processing]: fom=0x55a166f1c000, fop 0x55a166f1c0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468970870] seconds in processing]: fom=0x55a166ecf000, fop 0x55a166ecf0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468966147] seconds in processing]: fom=0x55a166ed1000, fop 0x55a166ed10b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468966303] seconds in processing]: fom=0x55a16701e000, fop 0x55a16701e0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468962428] seconds in processing]: fom=0x55a166f22000, fop 0x55a166f220b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468957949] seconds in processing]: fom=0x55a166fff000, fop 0x55a166fff0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468912155] seconds in processing]: fom=0x55a167029000, fop 0x55a1670290b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468912028] seconds in processing]: fom=0x55a167026000, fop 0x55a1670260b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468907147] seconds in processing]: fom=0x55a166f11000, fop 0x55a166f110b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468897861] seconds in processing]: fom=0x55a167001000, fop 0x55a1670010b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468944515] seconds in processing]: fom=0x55a16701b000, fop 0x55a16701b0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468946415] seconds in processing]: fom=0x55a1670bb000, fop 0x55a1670bb0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468951873] seconds in processing]: fom=0x55a1670bd000, fop 0x55a1670bd0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468960705] seconds in processing]: fom=0x55a1670d1000, fop 0x55a1670d10b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468956818] seconds in processing]: fom=0x55a1670d3000, fop 0x55a1670d30b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:468953819] seconds in processing]: fom=0x55a166fee000, fop 0x55a166fee0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:377866106] seconds in processing]: fom=0x55a166f26000, fop 0x55a166f260b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:362781693] seconds in processing]: fom=0x55a166e52000, fop 0x55a166e520b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[133:331610089] seconds in processing]: fom=0x55a166f20000, fop 0x55a166f200b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]:  63f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[128:607100602] seconds in processing]: fom=0x55a166e50000, fop 0x55a166e500b8[0] phase: RFS_WAITING
[2022-06-27 11:43:06] 2022-06-27T11:43:06.557+0000 7f1fd3f45700 -1 rgw dbstore: Initialization timeout, failed to initialize
[root@ssc-vm-g2-rhev4-1630 ~]#

Degraded IO failed with error. Failed one of the data pods by using replicas: kubectl scale deploy cortx-data-ssc-vm-rhev4-2450 --replicas 0

[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 ls --endpoint-url http://$IP:$Port
2022-06-27 05:00:18 test
2022-06-27 05:00:25 test2
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp sanityIO1gb s3://test2/sanityIO1gb --endpoint-url http://$IP:$Port
upload failed: ./sanityIO1gb to s3://test2/sanityIO1gb An error occurred (UnknownError) when calling the CreateMultipartUpload operation (reached max retries: 4): Unknown
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp sanityIO1gb s3://test2/sanityIO1gb1 --endpoint-url http://$IP:$Port
upload failed: ./sanityIO1gb to s3://test2/sanityIO1gb1 An error occurred (UnknownError) when calling the CreateMultipartUpload operation (reached max retries: 4): Unknown

cc. @mssawant, @pavankrishnat, @vaibhavparatwar

supriyachavan4398 commented 2 years ago

In degraded State, Ran S3bench cmd for testing of continuous read/write IOs. Found panic for that. Seen data pod restarts for that.

[root@ssc-vm-g4-rhev4-1587 ~]# kubectl get pods
NAME                                                 READY   STATUS    RESTARTS      AGE
cortx-consul-client-49j9c                            1/1     Running   0             6h31m
cortx-consul-client-5znhr                            1/1     Running   0             6h30m
cortx-consul-client-knb52                            1/1     Running   0             6h30m
cortx-consul-client-n4fkc                            1/1     Running   0             6h31m
cortx-consul-client-pkttk                            1/1     Running   0             6h32m
cortx-consul-server-0                                1/1     Running   0             6h30m
cortx-consul-server-1                                1/1     Running   0             6h31m
cortx-consul-server-2                                1/1     Running   0             6h32m
cortx-control-646746fdf4-54zcj                       1/1     Running   0             6h29m
cortx-data-ssc-vm-g4-rhev4-1588-79c7dd59fb-kmx2n     4/4     Running   1 (95m ago)   6h28m
cortx-data-ssc-vm-g4-rhev4-1589-d99d5899-5d9lt       4/4     Running   0             6h28m
cortx-data-ssc-vm-rhev4-2450-668b7fbccc-mfd2k        4/4     Running   1 (70m ago)   101m
cortx-data-ssc-vm-rhev4-2451-5cccc45c78-wp4zv        4/4     Running   1 (66m ago)   6h28m
cortx-data-ssc-vm-rhev4-2635-6f8575ddcd-kcs8p        4/4     Running   1 (86m ago)   6h28m
cortx-ha-6bc5b9557-lqhpm                             3/3     Running   0             6h24m
cortx-kafka-0                                        1/1     Running   0             6h34m
cortx-kafka-1                                        1/1     Running   0             6h34m
cortx-kafka-2                                        1/1     Running   0             6h34m
cortx-server-ssc-vm-g4-rhev4-1588-59fcb59654-6wtkp   2/2     Running   0             6h26m
cortx-server-ssc-vm-g4-rhev4-1589-b57677c44-8b45x    2/2     Running   0             6h26m
cortx-server-ssc-vm-rhev4-2450-7fbf87b59b-w4c24      2/2     Running   0             6h26m
cortx-server-ssc-vm-rhev4-2451-6c8cbd687d-qbv2g      2/2     Running   0             6h26m
cortx-server-ssc-vm-rhev4-2635-f678545db-lnpxg       2/2     Running   0             6h26m
cortx-zookeeper-0                                    1/1     Running   0             6h34m
cortx-zookeeper-1                                    1/1     Running   0             6h34m
cortx-zookeeper-2                                    1/1     Running   0             6h34m
[root@ssc-vm-g4-rhev4-1587 ~]# ./s3bench.2020-04-09 -accessKey sgiamadmin -accessSecret ldapadmin -bucket test-bucket1 -endpoint http://10.96.157.21:8081 -numClients 1 -numSamples 1 -objectNamePrefix=s3workload -objectSize 1Mb -verbose > /root/s3bench_1Mb_50Ksamples.log -region us-east-1
panic: Failed to create bucket: RequestError: send request failed
caused by: Put http://10.96.157.21:8081/test-bucket1: dial tcp 10.96.157.21:8081: i/o timeout

goroutine 1 [running]:
main.(*Params).prepareBucket(0xc0001cadd0, 0xc00030a000, 0x8a6dc9)
        /home/720554/proj/mero_s3bench/s3bench/s3bench.go:54 +0x32f
main.main()
        /home/720554/proj/mero_s3bench/s3bench/s3bench.go:179 +0xfa5

In hare-hax logs found motr panic:

motr[00097]:  7c40  ERROR  [spiel/cmd.c:2126:spiel_proc_counter_item_rlink_cb]  connect failed
2022-06-28 10:01:59,682 [ERROR] {byte-count-updater} Failed due to Bytecount stats unavailable. Aborting this iteration. Waiting for next attempt.
Traceback (most recent call last):
  File "/opt/seagate/cortx/hare/lib64/python3.6/site-packages/hax/bytecount.py", line 141, in _execute
    motr.get_proc_bytecount(ios)
  File "/opt/seagate/cortx/hare/lib64/python3.6/site-packages/hax/motr/__init__.py", line 754, in get_proc_bytecount
    raise BytecountException('Bytecount stats unavailable')
hax.exception.BytecountException
motr[00097]:  fbc0  FATAL  [lib/assert.c:50:m0_panic]  panic: fatal signal delivered at unknown() (unknown:0)  [git: 2.0.0-837-7-g03498588] /etc/cortx/hare/config/671459a81222ef684c19108fc9b89516/m0trace.97.2022-06-28-05:10:37
Motr panic: fatal signal delivered at unknown() unknown:0 (errno: 111) (last failed: none) [git: 2.0.0-837-7-g03498588] pid: 97  /etc/cortx/hare/config/671459a81222ef684c19108fc9b89516/m0trace.97.2022-06-28-05:10:37
Motr panic reason: signo: 11
/lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7fd4a22b5573]
/lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7fd4a22b5749]
/lib64/libmotr.so.2(m0_panic+0x13d)[0x7fd4a22a414d]
/lib64/libmotr.so.2(+0x3a079c)[0x7fd4a22b579c]
/lib64/libpthread.so.0(+0x12b30)[0x7fd4aab2cb30]
/lib64/libmotr.so.2(m0_tlist_next+0xc)[0x7fd4a22abfac]
/lib64/libmotr.so.2(+0x424c1e)[0x7fd4a2339c1e]
/lib64/libmotr.so.2(m0_rpc_frm_enq_item+0x2f0)[0x7fd4a233a7b0]
/lib64/libmotr.so.2(m0_rpc_item_send+0x13c)[0x7fd4a233f84c]
/lib64/libmotr.so.2(+0x42ba1e)[0x7fd4a2340a1e]
/lib64/libmotr.so.2(m0_sm_asts_run+0x131)[0x7fd4a234cab1]
/lib64/libmotr.so.2(m0_rpc_machine_lock+0x43)[0x7fd4a2345bb3]
/lib64/libmotr.so.2(rpc_worker_thread_fn+0x69)[0x7fd4a2346149]
/lib64/libmotr.so.2(m0_thread_trampoline+0x5e)[0x7fd4a22aacde]
/lib64/libmotr.so.2(+0x3a1431)[0x7fd4a22b6431]
/lib64/libpthread.so.0(+0x815a)[0x7fd4aab2215a]
/lib64/libc.so.6(clone+0x43)[0x7fd4aa0c7dd3]

2022-06-28 05:10:27,803 - executing command /usr/libexec/cortx-motr/motr-start m0d-0x7200000000000001:0x3
2022-06-28 05:10:27,873 - MOTR_M0D_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1588@22002
2022-06-28 05:10:27,875 - MOTR_PROCESS_FID: 0x7200000000000001:0x3
2022-06-28 05:10:27,875 - MOTR_HA_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1588@22001
2022-06-28 05:10:27,875 - MOTR_M0D_DATA_DIR: /etc/cortx/motr
2022-06-28 05:10:27,875 - MOTR_CONF_XC: /etc/motr/confd.xc
2022-06-28 05:10:27,899 - motr transport : libfab
2022-06-28 05:10:27,916 - Service FID: m0d-0x7200000000000001:0x3
2022-06-28 05:10:27,951 - BE log size is not configured
2022-06-28 05:10:27,951 - + exec /usr/bin/m0d -e libfab:inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1588@22002 -A linuxstob:/etc/cortx/log/motr/671459a81222ef684c19108fc9b89516/addb/m0d-0x7200000000000001:0x3/addb-stobs -f '<0x7200000000000001:0x3>' -T linux -S stobs -D db -m 524288 -q 64 -E 32 -J 64 -c /etc/motr/confd.xc -H inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1588@22001 -U -r 134217728
2022-06-28 05:10:32,908 - motr[00036]:  da60   WARN  [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick]  rlk_rc=-110
2022-06-28 05:10:36,913 - motr[00036]:  da60   WARN  [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick]  rlk_rc=-110
2022-06-28 05:11:16,604 - motr[00036]:  b190  ERROR  [conf/helpers.c:552:m0_conf_process2service_get]  <! rc=-2
2022-06-28 05:11:16,605 - Started
2022-06-28 05:11:16,605 - m0d: systemd notifications not allowed
2022-06-28 05:11:16,605 -
2022-06-28 05:11:16,605 - Press CTRL+C to quit.
2022-06-28 09:39:49,146 - motr[00036]:  d2a0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:0>: nr_failures:3 max_failures:2 event_index:6 event_state:3
2022-06-28 09:39:49,148 - motr[00036]:  d2a0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:0>: nr_failures:4 max_failures:2 event_index:7 event_state:3
2022-06-28 09:56:51,777 - motr[00036]:  d2a0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:0>: nr_failures:3 max_failures:2 event_index:4 event_state:1
2022-06-28 09:56:51,869 - motr[00036]:  af40  ERROR  [net/ip.c:452:m0_net_hostname_to_ip]  gethostbyname err=1 for 172-16-18-246.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
2022-06-28 09:56:51,869 - motr[00036]:  af40  ERROR  [net/ip.c:454:m0_net_hostname_to_ip]  <! rc=1
2022-06-28 09:56:51,869 - motr[00036]:  b0e0  ERROR  [net/libfab/libfab.c:2261:libfab_dns_resolve_retry]  gethostbyname() failed with err 1 for 172-16-18-246.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local@21001```
cc. @mssawant , @vaibhavparatwar, @pavankrishnat 
mssawant commented 2 years ago

retest this please

mssawant commented 2 years ago

Tested 6N deployment, regular IO, degraded IO with node failure, degraded IO with ioservice restart,


[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get pods
NAME                                                 READY   STATUS    RESTARTS   AGE
cortx-consul-client-4fd95                            1/1     Running   0          11m
cortx-consul-client-68kkv                            1/1     Running   0          10m
cortx-consul-client-7t629                            1/1     Running   0          11m
cortx-consul-client-dt67m                            1/1     Running   0          11m
cortx-consul-client-vhcqs                            1/1     Running   0          11m
cortx-consul-client-zp57r                            1/1     Running   0          11m
cortx-consul-server-0                                1/1     Running   0          9m34s
cortx-consul-server-1                                1/1     Running   0          10m
cortx-consul-server-2                                1/1     Running   0          11m
cortx-control-6555bcd848-s4h8g                       1/1     Running   0          9m2s
cortx-data-ssc-vm-g2-rhev4-1630-856bb78668-7kbjx     4/4     Running   0          7m57s
cortx-data-ssc-vm-g2-rhev4-1631-84b56d8955-6n4qg     4/4     Running   0          7m56s
cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp      4/4     Running   0          7m55s
cortx-data-ssc-vm-g2-rhev4-1635-7bb4cc8b75-n5dpg     4/4     Running   0          7m54s
cortx-data-ssc-vm-g2-rhev4-2237-86787d97f8-2rs5b     4/4     Running   0          7m53s
cortx-data-ssc-vm-g2-rhev4-2238-777c6f78cf-md6ph     4/4     Running   0          7m52s
cortx-ha-5769c7f7cc-zcg4m                            3/3     Running   0          4m16s
cortx-kafka-0                                        1/1     Running   0          13m
cortx-kafka-1                                        1/1     Running   0          13m
cortx-kafka-2                                        1/1     Running   0          13m
cortx-server-ssc-vm-g2-rhev4-1630-5fd67bb9b8-t46m4   2/2     Running   0          6m13s
cortx-server-ssc-vm-g2-rhev4-1631-77d89d568b-gd5mc   2/2     Running   0          6m12s
cortx-server-ssc-vm-g2-rhev4-1632-5597c88b68-r58rd   2/2     Running   0          6m12s
cortx-server-ssc-vm-g2-rhev4-1635-8667f864b6-t65gt   2/2     Running   0          6m11s
cortx-server-ssc-vm-g2-rhev4-2237-8b4445548-vvv94    2/2     Running   0          6m10s
cortx-server-ssc-vm-g2-rhev4-2238-5d6f9f6d9d-v9tc7   2/2     Running   0          6m9s
cortx-zookeeper-0                                    1/1     Running   0          13m
cortx-zookeeper-1                                    1/1     Running   0          13m
cortx-zookeeper-2                                    1/1     Running   0          13m
[root@ssc-vm-g2-rhev4-1630 ~]#

[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 mb s3://test
make_bucket: test
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp file_1G s3://test
upload: ./file_1G to s3://test/file_1G
[root@ssc-vm-g2-rhev4-1630 ~]#

[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get deployments
NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
cortx-control                       1/1     1            1           13m
cortx-data-ssc-vm-g2-rhev4-1630     1/1     1            1           11m
cortx-data-ssc-vm-g2-rhev4-1631     1/1     1            1           11m
cortx-data-ssc-vm-g2-rhev4-1632     1/1     1            1           11m
cortx-data-ssc-vm-g2-rhev4-1635     1/1     1            1           11m
cortx-data-ssc-vm-g2-rhev4-2237     1/1     1            1           11m
cortx-data-ssc-vm-g2-rhev4-2238     1/1     1            1           11m
cortx-ha                            1/1     1            1           8m16s
cortx-server-ssc-vm-g2-rhev4-1630   1/1     1            1           10m
cortx-server-ssc-vm-g2-rhev4-1631   1/1     1            1           10m
cortx-server-ssc-vm-g2-rhev4-1632   1/1     1            1           10m
cortx-server-ssc-vm-g2-rhev4-1635   1/1     1            1           10m
cortx-server-ssc-vm-g2-rhev4-2237   1/1     1            1           10m
cortx-server-ssc-vm-g2-rhev4-2238   1/1     1            1           10m
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get deployment cortx-data-ssc-vm-g2-rhev4-1635 -o yaml > cortx-data-ssc-vm-g2-rhev4-1635.yaml
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl delete deployment cortx-data-ssc-vm-g2-rhev4-1635
deployment.apps "cortx-data-ssc-vm-g2-rhev4-1635" deleted
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl exec -it cortx-data-ssc-vm-g2-rhev4-1630-856bb78668-7kbjx -c cortx-hax -- /bin/bash
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1630 /]#

# Degraded write

[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-1630 /]# consul kv get -recurse processes | grep STOPPED
processes/0x7200000000000001:0xc:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
processes/0x7200000000000001:0xd:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0xe:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0xf:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-1630 /]# hctl status -d | grep offline
    [offline]  hax                 0x7200000000000001:0xc          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1635@22001
    [offline]  ioservice           0x7200000000000001:0xd          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1635@21001
    [offline]  ioservice           0x7200000000000001:0xe          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1635@21002
    [offline]  confd               0x7200000000000001:0xf          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1635@22002
    [offline]  /dev/sdd
    [offline]  /dev/sde
    [offline]  /dev/sdc
    [offline]  /dev/sdg
    [offline]  /dev/sdh
    [offline]  /dev/sdf
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-1630 /]#
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp file_1G_2 s3://test
upload: ./file_1G_2 to s3://test/file_1G_2
[root@ssc-vm-g2-rhev4-1630 ~]#
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 ls s3://test
2022-06-28 10:30:59 1073741824 file_1G
2022-06-28 11:02:21 1073741824 file_1G_2
[root@ssc-vm-g2-rhev4-1630 ~]#

# Restarting failed pod

[root@ssc-vm-g2-rhev4-1630 ~]# kubectl apply -f cortx-data-ssc-vm-g2-rhev4-1635.yaml
deployment.apps/cortx-data-ssc-vm-g2-rhev4-1635 created
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get pods
NAME                                                 READY   STATUS    RESTARTS   AGE
cortx-consul-client-4fd95                            1/1     Running   0          51m
cortx-consul-client-68kkv                            1/1     Running   0          51m
cortx-consul-client-7t629                            1/1     Running   0          52m
cortx-consul-client-dt67m                            1/1     Running   0          51m
cortx-consul-client-vhcqs                            1/1     Running   0          52m
cortx-consul-client-zp57r                            1/1     Running   0          51m
cortx-consul-server-0                                1/1     Running   0          50m
cortx-consul-server-1                                1/1     Running   0          51m
cortx-consul-server-2                                1/1     Running   0          52m
cortx-control-6555bcd848-s4h8g                       1/1     Running   0          49m
cortx-data-ssc-vm-g2-rhev4-1630-856bb78668-7kbjx     4/4     Running   0          48m
cortx-data-ssc-vm-g2-rhev4-1631-84b56d8955-6n4qg     4/4     Running   0          48m
cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp      4/4     Running   0          48m
cortx-data-ssc-vm-g2-rhev4-1635-7bb4cc8b75-vqdq2     4/4     Running   0          2m52s
cortx-data-ssc-vm-g2-rhev4-2237-86787d97f8-2rs5b     4/4     Running   0          48m
cortx-data-ssc-vm-g2-rhev4-2238-777c6f78cf-md6ph     4/4     Running   0          48m
cortx-ha-5769c7f7cc-zcg4m                            3/3     Running   0          44m
cortx-kafka-0                                        1/1     Running   0          53m
cortx-kafka-1                                        1/1     Running   0          53m
cortx-kafka-2                                        1/1     Running   0          53m
cortx-server-ssc-vm-g2-rhev4-1630-5fd67bb9b8-t46m4   2/2     Running   0          46m
cortx-server-ssc-vm-g2-rhev4-1631-77d89d568b-gd5mc   2/2     Running   0          46m
cortx-server-ssc-vm-g2-rhev4-1632-5597c88b68-r58rd   2/2     Running   0          46m
cortx-server-ssc-vm-g2-rhev4-1635-8667f864b6-t65gt   2/2     Running   0          46m
cortx-server-ssc-vm-g2-rhev4-2237-8b4445548-vvv94    2/2     Running   0          46m
cortx-server-ssc-vm-g2-rhev4-2238-5d6f9f6d9d-v9tc7   2/2     Running   0          46m
cortx-zookeeper-0                                    1/1     Running   0          53m
cortx-zookeeper-1                                    1/1     Running   0          53m
cortx-zookeeper-2                                    1/1     Running   0          53m
[root@ssc-vm-g2-rhev4-1630 ~]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1630 /]# consul kv get -recurse | grep STOPPED
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1630 /]#
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp file_1G s3://test/file_1G_3
upload: ./file_1G to s3://test/file_1G_3
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 ls s3://test
2022-06-28 10:30:59 1073741824 file_1G
2022-06-28 11:02:21 1073741824 file_1G_2
2022-06-28 11:09:04 1073741824 file_1G_3
[root@ssc-vm-g2-rhev4-1630 ~]#

# Read file written in degraded mode

[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp file_1G s3://test/file_1G_3
upload: ./file_1G to s3://test/file_1G_3
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 ls s3://test
2022-06-28 10:30:59 1073741824 file_1G
2022-06-28 11:02:21 1073741824 file_1G_2
2022-06-28 11:09:04 1073741824 file_1G_3
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp s3://test/file_1G_2 file_1G_2_read_after_node_restart
download: s3://test/file_1G_2 to ./file_1G_2_read_after_node_restart
[root@ssc-vm-g2-rhev4-1630 ~]# diff file_1G_2 file_1G_2_read_after_node_restart
[root@ssc-vm-g2-rhev4-1630 ~]#

# Continuos IO test with process restart

[root@ssc-vm-g2-rhev4-1630 ~]# kubectl exec -it cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp -c cortx-motr-io-001
error: you must specify at least one command for the container
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl exec -it cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp -c cortx-motr-io-001 -- /bin/bash
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1632 /]# ps -aux | grep m0d
root         36  0.0  0.0  12408  1708 ?        S    16:21   0:00 /usr/bin/bash /usr/libexec/cortx-motr/motr-start m0d-0x7200000000000001:0x15
root         37 71.3  2.4 32537996 396412 ?     Sl   16:21  36:35 /usr/bin/m0d -e libfab:inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@21001 -A linuxstob:/etc/cortx/log/motr/e35889b2e1f4b04e18e9e501e6368966/addb/m0d-0x7200000000000001:0x15/addb-stobs -f <0x7200000000000001:0x15> -T ad -S stobs -D db -m 524288 -q 64 -E 32 -J 64 -H inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@22001 -U -B /dev/sdc -z 26843545600 -r 134217728
root        300  0.0  0.0   9204   752 pts/0    S+   17:12   0:00 grep --color=auto m0d
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1632 /]# kill -9 37
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1632 /]# command terminated with exit code 137
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get pods
NAME                                                 READY   STATUS    RESTARTS      AGE
cortx-consul-client-4fd95                            1/1     Running   0             57m
cortx-consul-client-68kkv                            1/1     Running   0             57m
cortx-consul-client-7t629                            1/1     Running   0             58m
cortx-consul-client-dt67m                            1/1     Running   0             57m
cortx-consul-client-vhcqs                            1/1     Running   0             58m
cortx-consul-client-zp57r                            1/1     Running   0             57m
cortx-consul-server-0                                1/1     Running   0             56m
cortx-consul-server-1                                1/1     Running   0             57m
cortx-consul-server-2                                1/1     Running   0             58m
cortx-control-6555bcd848-s4h8g                       1/1     Running   0             55m
cortx-data-ssc-vm-g2-rhev4-1630-856bb78668-7kbjx     4/4     Running   0             54m
cortx-data-ssc-vm-g2-rhev4-1631-84b56d8955-6n4qg     4/4     Running   0             54m
cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp      4/4     Running   1 (64s ago)   54m
cortx-data-ssc-vm-g2-rhev4-1635-7bb4cc8b75-vqdq2     4/4     Running   0             8m56s
cortx-data-ssc-vm-g2-rhev4-2237-86787d97f8-2rs5b     4/4     Running   0             54m
cortx-data-ssc-vm-g2-rhev4-2238-777c6f78cf-md6ph     4/4     Running   0             54m
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1630 /]# consul kv get -recurse | grep process_restart
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x0:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x1:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x1b:"1"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x2:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x3:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1631/process_restarts/0x7200000000000001:0x4:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1631/process_restarts/0x7200000000000001:0x5:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1631/process_restarts/0x7200000000000001:0x6:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1631/process_restarts/0x7200000000000001:0x7:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1632/process_restarts/0x7200000000000001:0x14:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1632/process_restarts/0x7200000000000001:0x15:"3"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1632/process_restarts/0x7200000000000001:0x16:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1632/process_restarts/0x7200000000000001:0x17:"2"
**cortx-data-headless-svc-ssc-vm-g2-rhev4-1635/process_restarts/0x7200000000000001:0xc:"3"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1635/process_restarts/0x7200000000000001:0xd:"3"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1635/process_restarts/0x7200000000000001:0xe:"3"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1635/process_restarts/0x7200000000000001:0xf:"3"**
cortx-data-headless-svc-ssc-vm-g2-rhev4-2237/process_restarts/0x7200000000000001:0x10:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-2237/process_restarts/0x7200000000000001:0x11:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-2237/process_restarts/0x7200000000000001:0x12:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-2237/process_restarts/0x7200000000000001:0x13:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-2238/process_restarts/0x7200000000000001:0x8:"2"
mssawant commented 2 years ago

15N deployment with regular and degraded IO


[root@ssc-vm-g2-rhev4-3031 ~]# kubectl get pods
NAME                                                 READY   STATUS    RESTARTS        AGE
cortx-consul-client-4hzrb                            1/1     Running   0               24m
cortx-consul-client-5qcsx                            1/1     Running   0               24m
cortx-consul-client-6lbfr                            1/1     Running   0               23m
cortx-consul-client-6n48h                            1/1     Running   0               24m
cortx-consul-client-7tkm6                            1/1     Running   0               23m
cortx-consul-client-82q62                            1/1     Running   0               24m
cortx-consul-client-98fbz                            1/1     Running   0               23m
cortx-consul-client-9mnjj                            1/1     Running   0               24m
cortx-consul-client-kskcz                            1/1     Running   0               23m
cortx-consul-client-mrlt5                            1/1     Running   0               23m
cortx-consul-client-p72h5                            1/1     Running   0               24m
cortx-consul-client-qmrh4                            1/1     Running   0               23m
cortx-consul-client-s9wq6                            1/1     Running   0               23m
cortx-consul-client-w8hgx                            1/1     Running   0               23m
cortx-consul-client-wg5bt                            1/1     Running   0               23m
cortx-consul-server-0                                1/1     Running   0               23m
cortx-consul-server-1                                1/1     Running   0               24m
cortx-consul-server-2                                1/1     Running   0               24m
cortx-control-76df54775b-jmg5n                       1/1     Running   0               22m
cortx-data-ssc-vm-g2-rhev4-3031-6b599df947-j2zxx     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3156-557664b9d7-qm9lv     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3157-5b48475f6d-slv84     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3158-7cbf676885-mksfz     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3159-8557b9d6bb-q94f9     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3160-7d796bf9f8-r5cs6     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3166-98ccb94b4-hn7mj      4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3167-57fc4c9799-hhntj     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3168-5d67b5d498-xlzbz     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3169-69f4798f85-g7m2v     4/4     Running   0               20m
cortx-data-ssc-vm-g2-rhev4-3170-85b4cf4cc7-bghbc     4/4     Running   0               20m
cortx-data-ssc-vm-g4-rhev4-1583-8479f977f-st2gm      4/4     Running   0               20m
cortx-data-ssc-vm-g4-rhev4-1590-74687dd45f-28f4j     4/4     Running   0               20m
cortx-data-ssc-vm-g4-rhev4-1591-97b6d94d4-h4s6f      4/4     Running   0               20m
cortx-data-ssc-vm-g4-rhev4-1592-6f75c67fbf-pnlth     4/4     Running   0               20m
cortx-ha-6c856f7d6c-m5hn4                            3/3     Running   0               10m
cortx-kafka-0                                        1/1     Running   1 (25m ago)     26m
cortx-kafka-1                                        1/1     Running   1 (25m ago)     26m
cortx-kafka-2                                        1/1     Running   2 (25m ago)     26m
cortx-server-ssc-vm-g2-rhev4-3031-6c4ddbf8bb-wwql7   2/2     Running   0               16m
cortx-server-ssc-vm-g2-rhev4-3156-f67bbbddb-vlr8t    2/2     Running   1 (5m25s ago)   16m
cortx-server-ssc-vm-g2-rhev4-3157-5bcb46c47b-vvnqd   2/2     Running   0               16m
cortx-server-ssc-vm-g2-rhev4-3158-5689c6d966-f6nst   2/2     Running   0               16m
cortx-server-ssc-vm-g2-rhev4-3159-7f567ddc56-847m4   2/2     Running   0               16m
cortx-server-ssc-vm-g2-rhev4-3160-767789996-fblkr    2/2     Running   0               16m
cortx-server-ssc-vm-g2-rhev4-3166-87fdcb968-z9skn    2/2     Running   0               16m
cortx-server-ssc-vm-g2-rhev4-3167-659fbbb55d-f86l2   2/2     Running   1 (5m25s ago)   16m
cortx-server-ssc-vm-g2-rhev4-3168-79fd6d98d4-hsxlh   2/2     Running   0               16m
cortx-server-ssc-vm-g2-rhev4-3169-6bb7f8d54b-jxhdd   2/2     Running   0               16m
cortx-server-ssc-vm-g2-rhev4-3170-8577748546-g9r6j   2/2     Running   0               16m
cortx-server-ssc-vm-g4-rhev4-1583-8699c7cfcd-gf9n8   2/2     Running   0               16m
cortx-server-ssc-vm-g4-rhev4-1590-6585896df4-xlh7r   2/2     Running   0               16m
cortx-server-ssc-vm-g4-rhev4-1591-79fcf98c5b-v7xkc   2/2     Running   1 (5m24s ago)   16m
cortx-server-ssc-vm-g4-rhev4-1592-66fc68d58-wkn8h    2/2     Running   0               16m
cortx-zookeeper-0                                    1/1     Running   0               26m
cortx-zookeeper-1                                    1/1     Running   0               26m
cortx-zookeeper-2                                    1/1     Running   0               26m
[root@ssc-vm-g2-rhev4-3031 ~]# kubectl exec -it cortx-data-ssc-vm-g2-rhev4-3031-6b599df947-j2zxx -c cortx-hax -- /bin/bash
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]# consul kv get -recurse processes | egrep 'STOPPED|STARTING|STOPPING'
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]# hctl status -d
Bytecount:
    critical : 0
    damaged : 0
    degraded : 0
    healthy : 0
Data pool:
    # fid name
    0x6f00000000000001:0x0 'storage-set-1__sns'
Profile:
    # fid name: pool(s)
    0x7000000000000001:0x0 'Profile_the_pool': 'storage-set-1__sns' 'storage-set-1__dix' None
Services:
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3168
    [started]  hax                 0x7200000000000001:0x0          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3168@22001
    [started]  ioservice           0x7200000000000001:0x1          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3168@21001
    [started]  ioservice           0x7200000000000001:0x2          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3168@21002
    [started]  confd               0x7200000000000001:0x3          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3168@22002
    cortx-data-headless-svc-ssc-vm-g4-rhev4-1590
    [started]  hax                 0x7200000000000001:0x4          inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1590@22001
    [started]  ioservice           0x7200000000000001:0x5          inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1590@21001
    [started]  ioservice           0x7200000000000001:0x6          inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1590@21002
    [started]  confd               0x7200000000000001:0x7          inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1590@22002
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3031
    [started]  hax                 0x7200000000000001:0x3e         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3031@22001
    [started]  rgw_s3              0x7200000000000001:0x3f         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3031@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3159
    [started]  hax                 0x7200000000000001:0x40         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3159@22001
    [started]  rgw_s3              0x7200000000000001:0x41         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3159@21001
    cortx-server-headless-svc-ssc-vm-g4-rhev4-1591
    [started]  hax                 0x7200000000000001:0x42         inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1591@22001
    [started]  rgw_s3              0x7200000000000001:0x43         inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1591@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3168
    [started]  hax                 0x7200000000000001:0x44         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3168@22001
    [started]  rgw_s3              0x7200000000000001:0x45         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3168@21001
    cortx-server-headless-svc-ssc-vm-g4-rhev4-1592
    [started]  hax                 0x7200000000000001:0x46         inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1592@22001
    [started]  rgw_s3              0x7200000000000001:0x47         inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1592@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3156
    [started]  hax                 0x7200000000000001:0x48         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3156@22001
    [started]  rgw_s3              0x7200000000000001:0x49         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3156@21001
    cortx-server-headless-svc-ssc-vm-g4-rhev4-1590
    [started]  hax                 0x7200000000000001:0x4a         inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1590@22001
    [started]  rgw_s3              0x7200000000000001:0x4b         inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1590@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3167
    [started]  hax                 0x7200000000000001:0x4c         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3167@22001
    [started]  rgw_s3              0x7200000000000001:0x4d         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3167@21001
    cortx-server-headless-svc-ssc-vm-g4-rhev4-1583
    [started]  hax                 0x7200000000000001:0x4e         inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1583@22001
    [started]  rgw_s3              0x7200000000000001:0x4f         inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1583@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3170
    [started]  hax                 0x7200000000000001:0x50         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3170@22001
    [started]  rgw_s3              0x7200000000000001:0x51         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3170@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3160
    [started]  hax                 0x7200000000000001:0x52         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3160@22001
    [started]  rgw_s3              0x7200000000000001:0x53         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3160@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3157
    [started]  hax                 0x7200000000000001:0x54         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3157@22001
    [started]  rgw_s3              0x7200000000000001:0x55         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3157@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3166
    [started]  hax                 0x7200000000000001:0x56         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3166@22001
    [started]  rgw_s3              0x7200000000000001:0x57         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3166@21001
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3169
    [started]  hax                 0x7200000000000001:0x58         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3169@22001
    [started]  rgw_s3              0x7200000000000001:0x59         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3169@21001
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3156
    [started]  hax                 0x7200000000000001:0x8          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3156@22001
    [started]  ioservice           0x7200000000000001:0x9          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3156@21001
    [started]  ioservice           0x7200000000000001:0xa          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3156@21002
    [started]  confd               0x7200000000000001:0xb          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3156@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3160
    [started]  hax                 0x7200000000000001:0xc          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3160@22001
    [started]  ioservice           0x7200000000000001:0xd          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3160@21001
    [started]  ioservice           0x7200000000000001:0xe          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3160@21002
    [started]  confd               0x7200000000000001:0xf          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3160@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3159
    [started]  hax                 0x7200000000000001:0x10         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3159@22001
    [started]  ioservice           0x7200000000000001:0x11         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3159@21001
    [started]  ioservice           0x7200000000000001:0x12         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3159@21002
    [started]  confd               0x7200000000000001:0x13         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3159@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3031
    [started]  hax                 0x7200000000000001:0x14         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3031@22001
    [started]  ioservice           0x7200000000000001:0x15         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3031@21001
    [started]  ioservice           0x7200000000000001:0x16         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3031@21002
    [started]  confd               0x7200000000000001:0x17         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3031@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3158
    [started]  hax                 0x7200000000000001:0x18         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3158@22001
    [started]  ioservice           0x7200000000000001:0x19         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3158@21001
    [started]  ioservice           0x7200000000000001:0x1a         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3158@21002
    [started]  confd               0x7200000000000001:0x1b         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3158@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3169
    [started]  hax                 0x7200000000000001:0x1c         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3169@22001
    [started]  ioservice           0x7200000000000001:0x1d         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3169@21001
    [started]  ioservice           0x7200000000000001:0x1e         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3169@21002
    [started]  confd               0x7200000000000001:0x1f         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3169@22002
    cortx-data-headless-svc-ssc-vm-g4-rhev4-1591  (RC)
    [started]  hax                 0x7200000000000001:0x20         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1591@22001
    [started]  ioservice           0x7200000000000001:0x21         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1591@21001
    [started]  ioservice           0x7200000000000001:0x22         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1591@21002
    [started]  confd               0x7200000000000001:0x23         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1591@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3170
    [started]  hax                 0x7200000000000001:0x24         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3170@22001
    [started]  ioservice           0x7200000000000001:0x25         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3170@21001
    [started]  ioservice           0x7200000000000001:0x26         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3170@21002
    [started]  confd               0x7200000000000001:0x27         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3170@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3167
    [started]  hax                 0x7200000000000001:0x28         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3167@22001
    [started]  ioservice           0x7200000000000001:0x29         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3167@21001
    [started]  ioservice           0x7200000000000001:0x2a         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3167@21002
    [started]  confd               0x7200000000000001:0x2b         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3167@22002
    cortx-data-headless-svc-ssc-vm-g4-rhev4-1583
    [started]  hax                 0x7200000000000001:0x2c         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1583@22001
    [started]  ioservice           0x7200000000000001:0x2d         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1583@21001
    [started]  ioservice           0x7200000000000001:0x2e         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1583@21002
    [started]  confd               0x7200000000000001:0x2f         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1583@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3157
    [started]  hax                 0x7200000000000001:0x30         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3157@22001
    [started]  ioservice           0x7200000000000001:0x31         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3157@21001
    [started]  ioservice           0x7200000000000001:0x32         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3157@21002
    [started]  confd               0x7200000000000001:0x33         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3157@22002
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3166
    [started]  hax                 0x7200000000000001:0x34         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3166@22001
    [started]  ioservice           0x7200000000000001:0x35         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3166@21001
    [started]  ioservice           0x7200000000000001:0x36         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3166@21002
    [started]  confd               0x7200000000000001:0x37         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3166@22002
    cortx-data-headless-svc-ssc-vm-g4-rhev4-1592
    [started]  hax                 0x7200000000000001:0x38         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1592@22001
    [started]  ioservice           0x7200000000000001:0x39         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1592@21001
    [started]  ioservice           0x7200000000000001:0x3a         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1592@21002
    [started]  confd               0x7200000000000001:0x3b         inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1592@22002
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3158
    [started]  hax                 0x7200000000000001:0x3c         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3158@22001
    [started]  rgw_s3              0x7200000000000001:0x3d         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3158@21001
Devices:
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3168
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdf
    [online]  /dev/sdg
    [online]  /dev/sdh
    cortx-data-headless-svc-ssc-vm-g4-rhev4-1590
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3031
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3159
    cortx-server-headless-svc-ssc-vm-g4-rhev4-1591
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3168
    cortx-server-headless-svc-ssc-vm-g4-rhev4-1592
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3156
    cortx-server-headless-svc-ssc-vm-g4-rhev4-1590
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3167
    cortx-server-headless-svc-ssc-vm-g4-rhev4-1583
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3170
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3160
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3157
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3166
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3169
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3156
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3160
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3159
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3031
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3158
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3169
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g4-rhev4-1591
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3170
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3167
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g4-rhev4-1583
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3157
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g2-rhev4-3166
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-data-headless-svc-ssc-vm-g4-rhev4-1592
    [online]  /dev/sdd
    [online]  /dev/sde
    [online]  /dev/sdc
    [online]  /dev/sdg
    [online]  /dev/sdh
    [online]  /dev/sdf
    cortx-server-headless-svc-ssc-vm-g2-rhev4-3158
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#

[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]# hctl status -d | egrep 'offline|recovering|UNKNOWN|unknown'
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#

[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 cp 1G s3://test
upload: ./1G to s3://test/1G
[root@ssc-vm-g2-rhev4-3031 ~]#

# Degraded write

[root@ssc-vm-g2-rhev4-3031 ~]# kubectl get deployment cortx-data-ssc-vm-g2-rhev4-3159 -o yaml > cortx-data-ssc-vm-g2-rhev4-3159.yaml
[root@ssc-vm-g2-rhev4-3031 ~]# kubectl delete deployment cortx-data-ssc-vm-g2-rhev4-3159
deployment.apps "cortx-data-ssc-vm-g2-rhev4-3159" deleted
[root@ssc-vm-g2-rhev4-3031 ~]#

[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-3031 /]# consul kv get -recurse | grep STOPPED
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3159/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3159/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3159/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-3031 /]# exit
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 ls
2022-06-28 17:51:24 test
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 ls s3://test
2022-06-28 17:52:54 1073741824 1G
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 cp 1G s3://test/1G_2
upload: ./1G to s3://test/1G_2
[root@ssc-vm-g2-rhev4-3031 ~]#

# Read after restart

[root@ssc-vm-g2-rhev4-3031 ~]# kubectl apply -f cortx-data-ssc-vm-g2-rhev4-3159.yaml
deployment.apps/cortx-data-ssc-vm-g2-rhev4-3159 created
[root@ssc-vm-g2-rhev4-3031 ~]#
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-3031 /]# consul kv get -recurse | grep STOPPED
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 ls s3://test
2022-06-28 17:52:54 1073741824 1G
2022-06-28 18:03:05 1073741824 1G_2
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 cp s3://test/1G_2 ./
download: s3://test/1G_2 to ./1G_2
[root@ssc-vm-g2-rhev4-3031 ~]#
[root@ssc-vm-g2-rhev4-3031 ~]# diff 1G 1G_2
[root@ssc-vm-g2-rhev4-3031 ~]#
supriyachavan4398 commented 2 years ago

Tested 6N deployment without dtm enabled, Ref. Custom build at https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/6991 Successfully done deployment with SNS: 4+2+0 and DIX: 1+4+0 Config. Manually Tested IOs. Happy path IO's worked fine. Degraded Write works fine.

[root@ssc-vm-g4-rhev4-1587 ~]# kubectl get pods
NAME                                                 READY   STATUS    RESTARTS      AGE
cortx-consul-client-5clwp                            1/1     Running   0             19m
cortx-consul-client-5k9hh                            1/1     Running   0             19m
cortx-consul-client-cjbzx                            1/1     Running   0             18m
cortx-consul-client-z47xc                            1/1     Running   0             19m
cortx-consul-client-zzks4                            1/1     Running   0             18m
cortx-consul-server-0                                1/1     Running   0             18m
cortx-consul-server-1                                1/1     Running   0             18m
cortx-consul-server-2                                1/1     Running   0             19m
cortx-control-68f7dbd6bd-rxlqp                       1/1     Running   0             16m
cortx-data-ssc-vm-g4-rhev4-1588-76dfcb5848-znr6l     4/4     Running   0             15m
cortx-data-ssc-vm-g4-rhev4-1589-589b684995-slv4c     4/4     Running   0             15m
cortx-data-ssc-vm-rhev4-2450-6896d75df9-w7qss        4/4     Running   0             15m
cortx-data-ssc-vm-rhev4-2451-795986bb96-dznqr        4/4     Running   0             15m
cortx-data-ssc-vm-rhev4-2635-df564987b-78wg8         4/4     Running   0             15m
cortx-ha-d9ff49645-rvz9x                             3/3     Running   0             10m
cortx-kafka-0                                        1/1     Running   0             21m
cortx-kafka-1                                        1/1     Running   1 (20m ago)   21m
cortx-kafka-2                                        1/1     Running   0             21m
cortx-server-ssc-vm-g4-rhev4-1588-64d958584f-45b6r   2/2     Running   0             13m
cortx-server-ssc-vm-g4-rhev4-1589-7f54f49d6c-vm9tn   2/2     Running   0             12m
cortx-server-ssc-vm-rhev4-2450-8d69fb6b7-mnl9j       2/2     Running   0             12m
cortx-server-ssc-vm-rhev4-2451-794df4997c-nwzvs      2/2     Running   0             12m
cortx-server-ssc-vm-rhev4-2635-5dbf9c9665-4hc6l      2/2     Running   0             12m
cortx-zookeeper-0                                    1/1     Running   0             21m
cortx-zookeeper-1                                    1/1     Running   0             21m
cortx-zookeeper-2                                    1/1     Running   0             21m

[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 mb s3://test --endpoint-url http://$IP:$Port
make_bucket: test
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp file_1gb s3://test/file_1gb --endpoint-url http://$IP:$Port
upload: ./file_1gb to s3://test/file_1gb

###Degreaded Write:
[root@ssc-vm-g4-rhev4-1587 ~]# kubectl scale deploy cortx-data-ssc-vm-rhev4-2450 --replicas 0
deployment.apps/cortx-data-ssc-vm-rhev4-2450 scaled

[root@cortx-data-headless-svc-ssc-vm-g4-rhev4-1588 9a5a467bbc2a4aa2ef3b12142e1598cb]# consul kv get -recurse processes | grep STOPPED
processes/0x7200000000000001:0x4:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
processes/0x7200000000000001:0x5:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0x6:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0x7:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
[root@cortx-data-headless-svc-ssc-vm-g4-rhev4-1588 9a5a467bbc2a4aa2ef3b12142e1598cb]# hctl status -d | grep offline
    [offline]  hax                 0x7200000000000001:0x4          inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2450@22001
    [offline]  ioservice           0x7200000000000001:0x5          inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2450@21001
    [offline]  ioservice           0x7200000000000001:0x6          inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2450@21002
    [offline]  confd               0x7200000000000001:0x7          inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2450@22002
    [offline]  /dev/sdd
    [offline]  /dev/sde
    [offline]  /dev/sdc
    [offline]  /dev/sdg
    [offline]  /dev/sdh
    [offline]  /dev/sdf

[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 ls s3://test/ --endpoint-url http://$IP:$Port
2022-06-29 03:34:53 1048576000 file_1gb
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp file_1gb s3://test/file_1gb_2 --endpoint-url http://$IP:$Port
upload: ./file_1gb to s3://test/file_1gb_2
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 ls s3://test/ --endpoint-url http://$IP:$Port
2022-06-29 03:34:53 1048576000 file_1gb
2022-06-29 03:46:58 1048576000 file_1gb_2

###Restarting failed pod:
[root@ssc-vm-g4-rhev4-1587 ~]# kubectl scale deploy cortx-data-ssc-vm-rhev4-2450 --replicas 1
deployment.apps/cortx-data-ssc-vm-rhev4-2450 scaled

[root@cortx-data-headless-svc-ssc-vm-g4-rhev4-1588 9a5a467bbc2a4aa2ef3b12142e1598cb]# hctl status -d | grep offline
[root@cortx-data-headless-svc-ssc-vm-g4-rhev4-1588 9a5a467bbc2a4aa2ef3b12142e1598cb]# consul kv get -recurse processes | grep STOPPED

###**Tried to write new data objects:**
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp file_1gb s3://test/file_1gb_3 --endpoint-url http://$IP:$Port
upload failed: ./file_1gb to s3://test/file_1gb_3 Read timeout on endpoint URL: "http://10.102.81.163:80/test/file_1gb_3?uploads"
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 ls s3://test/ --endpoint-url http://$IP:$Port

Read timeout on endpoint URL: "http://10.102.81.163:80/test?list-type=2&prefix=&delimiter=%2F&encoding-type=url"

Not able to write new data objects after node restarts. and also not able to read data written in degraded mode.

[root@ssc-vm-g4-rhev4-1587 ~]# kubectl logs cortx-data-ssc-vm-rhev4-2635-df564987b-78wg8 --all-containers
2022-06-29 10:08:30,469 [INFO] Starting Hare services
2022-06-29 10:08:30,530 [INFO] Entering logrotate_generic at line 161 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,531 [INFO] Entering get_log_dir at line 696 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,627 [INFO] Leaving get_log_dir
2022-06-29 10:08:30,628 [INFO] Leaving logrotate_generic
2022-06-29 10:08:30,635 [INFO] Entering start_hax_and_consul_without_systemd at line 414 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,636 [INFO] Entering get_config_dir at line 704 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,710 [INFO] Leaving get_config_dir
2022-06-29 10:08:30,710 [INFO] Entering get_log_dir at line 696 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,790 [INFO] Leaving get_log_dir
2022-06-29 10:08:30,791 [INFO] Entering _start_consul at line 240 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,858 [INFO] Entering Utils.get_local_hostname at line 91 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/utils.py
2022-06-29 10:08:30,859 [INFO] Entering Utils.get_hostname at line 79 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/utils.py
2022-06-29 10:08:30,859 [INFO] Leaving Utils.get_hostname
2022-06-29 10:08:30,859 [INFO] Leaving Utils.get_local_hostname
2022-06-29 10:08:30,860 [INFO] Entering ConsulStarter._execute at line 65 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/consul_starter.py
2022-06-29 10:08:35,919 [INFO] Leaving _start_consul
2022-06-29 10:08:35,919 [INFO] Entering _start_hax at line 298 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:35,920 [INFO] Entering HaxStarter._execute at line 54 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/hax_starter.py
2022-06-29 10:08:35,920 [INFO] Leaving _start_hax
2022-06-29 08:57:59,174 - executing command /usr/libexec/cortx-motr/motr-start m0d-0x7200000000000001:0x13
2022-06-29 08:57:59,248 - MOTR_M0D_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2635@22002
2022-06-29 08:57:59,249 - MOTR_PROCESS_FID: 0x7200000000000001:0x13
2022-06-29 08:57:59,249 - MOTR_HA_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2635@22001
2022-06-29 08:57:59,249 - MOTR_M0D_DATA_DIR: /etc/cortx/motr
2022-06-29 08:57:59,249 - MOTR_CONF_XC: /etc/motr/confd.xc
2022-06-29 08:57:59,266 - motr transport : libfab
2022-06-29 08:57:59,276 - Service FID: m0d-0x7200000000000001:0x13
2022-06-29 08:57:59,281 - BE log size is not configured
2022-06-29 08:57:59,281 - + exec /usr/bin/m0d -e libfab:inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2635@22002 -A linuxstob:/etc/cortx/log/motr/cc3999bb13ca76034b4dfca9adfa7f90/addb/m0d-0x7200000000000001:0x13/addb-stobs -f '<0x7200000000000001:0x13>' -T linux -S stobs -D db -m 524288 -q 64 -E 32 -J 64 -c /etc/motr/confd.xc -H inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2635@22001 -U -r 134217728
2022-06-29 08:58:03,783 - motr[00036]:  ba60   WARN  [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick]  rlk_rc=-110
2022-06-29 08:58:07,785 - motr[00036]:  ba60   WARN  [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick]  rlk_rc=-110
2022-06-29 08:58:11,786 - motr[00036]:  ba60   WARN  [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick]  rlk_rc=-110
2022-06-29 08:58:36,474 - motr[00036]:  bb90  ERROR  [conf/helpers.c:552:m0_conf_process2service_get]  <! rc=-2
2022-06-29 08:58:36,475 - Started
2022-06-29 08:58:36,475 - m0d: systemd notifications not allowed
2022-06-29 08:58:36,475 -
2022-06-29 08:58:36,475 - Press CTRL+C to quit.
2022-06-29 09:50:35,565 - motr[00036]:  cf40  ERROR  [net/ip.c:452:m0_net_hostname_to_ip]  gethostbyname err=1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
2022-06-29 09:50:35,565 - motr[00036]:  cf40  ERROR  [net/ip.c:454:m0_net_hostname_to_ip]  <! rc=1
2022-06-29 09:50:35,566 - motr[00036]:  d0e0  ERROR  [net/libfab/libfab.c:2261:libfab_dns_resolve_retry]  gethostbyname() failed with err 1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local@21002
2022-06-29 09:52:42,844 - motr[00036]:  d3c0  ERROR  [rpc/frmops.c:576:item_fail]  packet 0x7ff048067230, item 0x7ff0480639b0[36] failed with ri_error=-110
2022-06-29 09:52:42,861 - motr[00036]:  cf10  ERROR  [net/ip.c:452:m0_net_hostname_to_ip]  gethostbyname err=1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
2022-06-29 09:52:42,861 - motr[00036]:  cf10  ERROR  [net/ip.c:454:m0_net_hostname_to_ip]  <! rc=1
2022-06-29 09:52:42,861 - motr[00036]:  d0b0  ERROR  [net/libfab/libfab.c:2261:libfab_dns_resolve_retry]  gethostbyname() failed with err 1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local@21002
2022-06-29 09:54:50,082 - motr[00036]:  d390  ERROR  [rpc/frmops.c:576:item_fail]  packet 0x7ff048067230, item 0x7ff0480639b0[38] failed with ri_error=-110
2022-06-29 09:54:50,083 - motr[00036]:  da20  ERROR  [rpc/link.c:154:rpc_link_conn_terminate]  Connection termination failed (rlink=0x55aa1c96fd80)
2022-06-29 09:54:50,083 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528641953] seconds in processing]: fom=0x55aa1c977318, fop 0x55aa1c9773d0[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528654010] seconds in processing]: fom=0x55aa1c99a7c8, fop 0x55aa1c99a880[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528656331] seconds in processing]: fom=0x55aa1c969138, fop 0x55aa1c9691f0[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528658287] seconds in processing]: fom=0x55aa1c9a18b8, fop 0x55aa1c9a1970[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528660188] seconds in processing]: fom=0x55aa1c97e408, fop 0x55aa1c97e4c0[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528661808] seconds in processing]: fom=0x55aa1c9854f8, fop 0x55aa1c9855b0[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528663396] seconds in processing]: fom=0x55aa1c9936d8, fop 0x55aa1c993790[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528664791] seconds in processing]: fom=0x55aa1c98c5e8, fop 0x55aa1c98c6a0[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528666209] seconds in processing]: fom=0x55aa1c9309b8, fop 0x55aa1c930a70[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528671757] seconds in processing]: fom=0x55aa1c937aa8, fop 0x55aa1c937b60[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528673657] seconds in processing]: fom=0x55aa1c95af58, fop 0x55aa1c95b010[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528675075] seconds in processing]: fom=0x55aa1c9298c8, fop 0x55aa1c929980[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528676347] seconds in processing]: fom=0x55aa1c962048, fop 0x55aa1c962100[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528670699] seconds in processing]: fom=0x55aa1c93eb98, fop 0x55aa1c93ec50[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528671883] seconds in processing]: fom=0x55aa1c945c88, fop 0x55aa1c945d40[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528673239] seconds in processing]: fom=0x55aa1c953e68, fop 0x55aa1c953f20[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]:  d9f0   WARN  [fop/fom.c:362:hung_fom_notify]  FOP HUNG[[254:528674765] seconds in processing]: fom=0x55aa1c94cd78, fop 0x55aa1c94ce30[0] phase: Initialised
2022-06-29 09:54:50,092 - motr[00036]:  cf40  ERROR  [net/ip.c:452:m0_net_hostname_to_ip]  gethostbyname err=1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
2022-06-29 09:54:50,092 - motr[00036]:  cf40  ERROR  [net/ip.c:454:m0_net_hostname_to_ip]  <! rc=1
2022-06-29 09:54:50,092 - motr[00036]:  d0e0  ERROR  [net/libfab/libfab.c:2261:libfab_dns_resolve_retry]  gethostbyname() failed with err 1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local@21002
2022-06-29 09:56:57,307 - motr[00036]:  d3c0  ERROR  [rpc/frmops.c:576:item_fail]  packet 0x7ff048044210, item 0x7ff0480639b0[36] failed with ri_error=-110
2022-06-29 09:56:57,339 - motr[00036]:  cf10  ERROR  [net/ip.c:452:m0_net_hostname_to_ip]  gethostbyname err=1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local

Found motr panic and errors in hare-hax logs:

motr[00093]:  5900  FATAL  [lib/assert.c:50:m0_panic]  panic: fatal signal delivered at unknown() (unknown:0)  [git: 2.0.0-837-13-g77231467] /etc/cortx/hare/config/9a5a467bbc2a4aa2ef3b12142e1598cb/m0trace.93.2022-06-29-08:58:08
Motr panic: fatal signal delivered at unknown() unknown:0 (errno: 0) (last failed: none) [git: 2.0.0-837-13-g77231467] pid: 93  /etc/cortx/hare/config/9a5a467bbc2a4aa2ef3b12142e1598cb/m0trace.93.2022-06-29-08:58:08
Motr panic reason: signo: 11
/lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7fe5799766f3]
/lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7fe5799768c9]
/lib64/libmotr.so.2(m0_panic+0x13d)[0x7fe5799652cd]
/lib64/libmotr.so.2(+0x3a091c)[0x7fe57997691c]
/lib64/libpthread.so.0(+0x12b30)[0x7fe5821edb30]
/lib64/libmotr.so.2(m0_tlist_next+0xc)[0x7fe57996d12c]
/lib64/libmotr.so.2(+0x424dbe)[0x7fe5799fadbe]
/lib64/libmotr.so.2(m0_rpc_frm_enq_item+0x300)[0x7fe5799fb960]
/lib64/libmotr.so.2(m0_rpc_item_send+0x13c)[0x7fe579a009fc]
/lib64/libmotr.so.2(m0_rpc__post_locked+0x167)[0x7fe579a053f7]
/lib64/libmotr.so.2(m0_rpc_post+0x99)[0x7fe579a05629]
/lib64/libmotr.so.2(+0x36cd5a)[0x7fe579942d5a]
/lib64/libmotr.so.2(+0x35fb74)[0x7fe579935b74]
/lib64/libmotr.so.2(m0_thread_trampoline+0x5e)[0x7fe57996be5e]
/lib64/libmotr.so.2(+0x3a15b1)[0x7fe5799775b1]
/lib64/libpthread.so.0(+0x815a)[0x7fe5821e315a]
/lib64/libc.so.6(clone+0x43)[0x7fe581788dd3]

Checked thrice this results in different configs but faced the same issue again and again. This motr panic is earlier logged in https://jts.seagate.com/browse/CORTX-31834 cc. @mssawant, @vaibhavparatwar, @pavankrishnat

supriyachavan4398 commented 2 years ago

Tested 6N deployment with dtm enabled, Ref. Custom build at https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/6993 Successfully done deployment with SNS: 4+1+0 and DIX: 1+4+0 Config. Started new IOStabilityRuns job for degraded read type-3 at https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/210 cc. @mssawant, @vaibhavparatwar, @pavankrishnat

mssawant commented 2 years ago

retest this please

mssawant commented 2 years ago

retest this please

mssawant commented 2 years ago

Created https://jts.seagate.com/browse/CORTX-33263 for motr rpc crash seen in hax on data pod restart and failure.

mssawant commented 2 years ago

retest this please