Closed mssawant closed 2 years ago
@mssawant what is the JIRA we can link this PR to?
Mandar, with about Changes Happy Path IOs are failing on both 5N (sanityJob) and 15N (manually) VMs Build: custom-ci #6904 Deployment: setup-cortx-rgw-cluster #7759 Sanity: Hare_k8s_Sanity_PR #164
Sanity error: Hare_Sanity_164_consoleText.txt Failed tests:
FAILED tests/cft/test_io_workload.py::TestIOWorkload::test_basic_io - commons... FAILED tests/csm/rest/test_iam_users.py::TestIamUserRGW::test_37016 - commons... FAILED tests/s3/test_data_path_validation.py::TestDataPathValidation::test_1701[1000-1M] FAILED tests/s3/test_object_workflow_operations.py::TestObjectWorkflowOperations::test_delete_object_2220
Error info:
=================================== FAILURES =================================== _________________________ TestIOWorkload.test_basic_io _________________________ LOGGER.info("Putting object") try: response = super().put_object(bucket_name, object_name, file_path, **kwargs) except (ClientError, Exception) as error: LOGGER.error("Error in %s: %s", S3TestLib.put_object.__name__, error) > raise CTException(err.S3_CLIENT_ERROR, error.args[0]) from error E commons.exceptions.CTException: CTException: EC(4007) E Error Desc: S3 Client Error E Error Message: An error occurred (UnknownError) when calling the PutObject operation (reached max retries: 6): Unknown E Other info: E {}
On 15N:
Write file... + aws s3 ls s3://test/ --endpoint-url http://10.110.208.81:80 + aws s3 cp sanityIO1gb s3://test/sanityIO1gb --endpoint-url http://10.110.208.81:80 upload failed: ./sanityIO1gb to s3://test/sanityIO1gb An error occurred (InvalidArgument) when calling the UploadPart operation: Unknown
dg test failed: https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/185/console (test_degraded_iteration_write_read_partial_delete DegradedPath-Type4)
[2022-06-24 04:23:41] [MainThread] [DEBUG ] [system_utils.py: 165]: Command: s3bench -accessKey=IOF6TNDN8C2TOETQYIOC -accessSecret=Zb2dlZb51Ru84WJU15tlGXr7Xx2RW2WxnRel9uj9 -bucket=test-40174-bkt-0-1656066194.826625 -endpoint=https://192.168.62.236:30443/ -numClients=10 -numSamples=315 -objectNamePrefix=obj_134217728 -objectSize=134217728b -skipSSLCertVerification=True -s3MaxRetries=5 -httpClientTimeout=500000 -region us-east-1 -skipCleanup -validate >> log/latest/write_workload_134217728b_s3bench_10_315_134217728b_24-06-2022-04-23-41-034681.log 2>&1 [2022-06-24 04:25:53] [MainThread] [DEBUG ] [system_utils.py: 170]: output = b'' [2022-06-24 04:25:53] [MainThread] [DEBUG ] [system_utils.py: 171]: error = b'' [2022-06-24 04:25:53] [MainThread] [DEBUG ] [s3bench.py: 251]: Response: (True, "b''") [2022-06-24 04:25:53] [MainThread] [INFO ] [s3bench.py: 253]: Workload execution completed. [2022-06-24 04:25:53] [MainThread] [DEBUG ] [s3bench.py: 97]: list response ["b''"] [2022-06-24 04:25:53] [MainThread] [INFO ] [near_full_data_storage.py: 134]: Workload: 315 objects of 134217728 with 10 parallel clients [2022-06-24 04:25:53] [MainThread] [INFO ] [near_full_data_storage.py: 135]: Log Path log/latest/write_workload_134217728b_s3bench_10_315_134217728b_24-06-2022-04-23-41-034681.log [2022-06-24 04:25:53] [MainThread] [INFO ] [s3bench.py: 139]: Debug: Log File Path log/latest/write_workload_134217728b_s3bench_10_315_134217728b_24-06-2022-04-23-41-034681.log [2022-06-24 04:25:53] [MainThread] [INFO ] [s3bench.py: 150]: 'Error count' filtered list: [' Errors Count: 258\n', ' Errors Count: 258\n', ' Errors Count: 258\n']
After freshly re-deployed (1+14N cluster), Happy Path IOs Passed. testing degraded IOs. Test IOs in dg mode Job: https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/193/console (Failed in Happy Path).
Manually testing IOs, 4 times successfully read/write 1Gb files in dg mode kubectl scale deploy cortx-data-ssc-vm-g3-rhev4-2278 --replicas 0
.
1N deployment fails due remove bucket failure,
16:26:21 --------------- Remove 'test-bucket' bucket ---------------
16:26:21
16:26:22 remove_bucket failed: s3://test-bucket An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: Unknown
Hare sanity fails with the same error,
16:30:21 --------------- Remove 'test-bucket' bucket ---------------
16:30:21
16:30:21 remove_bucket failed: s3://test-bucket An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: Unknown
[Pipeline] }
Tested 15N deployment without dtm enabled, tested ioservice restart with continuous IO, degraded IO worked. Tested 15N deployment with dtm enabled, deployment completed for data and server pods. Seen data and server pods restarts, eventually cluster stabalized.
Testing:
Custom build: https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/6938/ Deployment successful at https://eos-jenkins.colo.seagate.com/job/Cortx-Automation/job/RGW/job/setup-cortx-rgw-cluster/7924/ Error info: For data pods motr ioservice containers restart so many times:
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl logs cortx-data-ssc-vm-g2-rhev4-1632-5f9884c6d6-zd98q -c cortx-motr-io-001
2022-06-27 10:48:01,967 - executing command /usr/libexec/cortx-motr/motr-start m0d-0x7200000000000001:0x19
2022-06-27 10:48:02,017 - MOTR_M0D_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@21001
2022-06-27 10:48:02,019 - MOTR_PROCESS_FID: 0x7200000000000001:0x19
2022-06-27 10:48:02,019 - MOTR_HA_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@22001
2022-06-27 10:48:02,019 - MOTR_M0D_DATA_DIR: /etc/cortx/motr
2022-06-27 10:48:02,041 - motr transport : libfab
2022-06-27 10:48:02,063 - Service FID: m0d-0x7200000000000001:0x19
2022-06-27 10:48:02,080 - BE log size is not configured
2022-06-27 10:48:02,080 - + exec /usr/bin/m0d -e libfab:inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@21001 -A linuxstob:/etc/cortx/log/motr/4d00d87bf8a357c6471d06636ef677c5/addb/m0d-0x7200000000000001:0x19/addb-stobs -f '<0x7200000000000001:0x19>' -T ad -S stobs -D db -m 524288 -q 64 -E 32 -J 64 -H inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@22001 -U -B /dev/sdc -z 26843545600 -r 134217728
2022-06-27 10:48:10,281 - motr[00054]: b060 ERROR [reqh/reqh.c:454:m0_reqh_fop_allow] <! rc=-111
2022-06-27 10:48:10,282 - motr[00054]: b0e0 WARN [reqh/reqh.c:561:m0_reqh_fop_handle] fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:10,282 - motr[00054]: b0e0 ERROR [reqh/reqh.c:568:m0_reqh_fop_handle] <! rc=-108 Service shutdown.
2022-06-27 10:48:11,281 - motr[00054]: b060 ERROR [reqh/reqh.c:454:m0_reqh_fop_allow] <! rc=-111
2022-06-27 10:48:11,282 - motr[00054]: b0e0 WARN [reqh/reqh.c:561:m0_reqh_fop_handle] fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:11,282 - motr[00054]: b0e0 ERROR [reqh/reqh.c:568:m0_reqh_fop_handle] <! rc=-108 Service shutdown.
2022-06-27 10:48:12,282 - motr[00054]: b060 ERROR [reqh/reqh.c:454:m0_reqh_fop_allow] <! rc=-111
2022-06-27 10:48:12,282 - motr[00054]: b0e0 WARN [reqh/reqh.c:561:m0_reqh_fop_handle] fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:12,282 - motr[00054]: b0e0 ERROR [reqh/reqh.c:568:m0_reqh_fop_handle] <! rc=-108 Service shutdown.
2022-06-27 10:48:13,282 - motr[00054]: b060 ERROR [reqh/reqh.c:454:m0_reqh_fop_allow] <! rc=-111
2022-06-27 10:48:13,282 - motr[00054]: b0e0 WARN [reqh/reqh.c:561:m0_reqh_fop_handle] fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:13,282 - motr[00054]: b0e0 ERROR [reqh/reqh.c:568:m0_reqh_fop_handle] <! rc=-108 Service shutdown.
2022-06-27 10:48:14,282 - motr[00054]: b060 ERROR [reqh/reqh.c:454:m0_reqh_fop_allow] <! rc=-111
2022-06-27 10:48:14,283 - motr[00054]: b0e0 WARN [reqh/reqh.c:561:m0_reqh_fop_handle] fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:14,283 - motr[00054]: b0e0 ERROR [reqh/reqh.c:568:m0_reqh_fop_handle] <! rc=-108 Service shutdown.
2022-06-27 10:48:15,288 - motr[00054]: b060 ERROR [reqh/reqh.c:454:m0_reqh_fop_allow] <! rc=-111
2022-06-27 10:48:15,289 - motr[00054]: b0e0 WARN [reqh/reqh.c:561:m0_reqh_fop_handle] fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:15,289 - motr[00054]: b0e0 ERROR [reqh/reqh.c:568:m0_reqh_fop_handle] <! rc=-108 Service shutdown.
2022-06-27 10:48:16,283 - motr[00054]: b060 ERROR [reqh/reqh.c:454:m0_reqh_fop_allow] <! rc=-111
2022-06-27 10:48:16,283 - motr[00054]: b0e0 WARN [reqh/reqh.c:561:m0_reqh_fop_handle] fop "DTM0 redo"@0x7f552c026880 disallowed: -111.
2022-06-27 10:48:16,283 - motr[00054]: b0e0 ERROR [reqh/reqh.c:568:m0_reqh_fop_handle] <! rc=-108 Service shutdown.
2022-06-27 10:48:16,887 - motr[00054]: b060 ERROR [reqh/reqh.c:454:m0_reqh_fop_allow] <! rc=-111
p 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:31:55,237 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2632:435681331] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:32:08,969 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2646:168313715] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:32:57,518 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2694:716999746] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:33:46,788 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2743:986793723] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:34:39,075 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2796:273196139] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:02,138 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2819:337245942] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:08,552 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2825:750653292] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:19,937 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2837:136528432] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:26,134 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2843:333003119] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:46,423 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2863:620947470] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:35:57,450 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2874:648941618] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:36:09,491 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2886:689949617] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:36:51,014 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2928:213002793] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:37:13,985 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2951:184347233] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:37:22,946 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[2960:144947794] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:38:05,007 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[3002:205868984] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:39:26,802 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[3084:000385896] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
2022-06-27 11:39:41,640 - motr[00054]: 79f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[3098:839329248] seconds in processing]: fom=0x7ffcb259ccf0, fop 0x7ffcb259cda8[0] phase: HEC_FOM_INIT
For server pods, cortx-rgw container restarts many times:
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:469149801] seconds in processing]: fom=0x55a166f1c000, fop 0x55a166f1c0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468970870] seconds in processing]: fom=0x55a166ecf000, fop 0x55a166ecf0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468966147] seconds in processing]: fom=0x55a166ed1000, fop 0x55a166ed10b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468966303] seconds in processing]: fom=0x55a16701e000, fop 0x55a16701e0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468962428] seconds in processing]: fom=0x55a166f22000, fop 0x55a166f220b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468957949] seconds in processing]: fom=0x55a166fff000, fop 0x55a166fff0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468912155] seconds in processing]: fom=0x55a167029000, fop 0x55a1670290b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468912028] seconds in processing]: fom=0x55a167026000, fop 0x55a1670260b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468907147] seconds in processing]: fom=0x55a166f11000, fop 0x55a166f110b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468897861] seconds in processing]: fom=0x55a167001000, fop 0x55a1670010b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468944515] seconds in processing]: fom=0x55a16701b000, fop 0x55a16701b0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468946415] seconds in processing]: fom=0x55a1670bb000, fop 0x55a1670bb0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468951873] seconds in processing]: fom=0x55a1670bd000, fop 0x55a1670bd0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468960705] seconds in processing]: fom=0x55a1670d1000, fop 0x55a1670d10b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468956818] seconds in processing]: fom=0x55a1670d3000, fop 0x55a1670d30b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:468953819] seconds in processing]: fom=0x55a166fee000, fop 0x55a166fee0b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:377866106] seconds in processing]: fom=0x55a166f26000, fop 0x55a166f260b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:362781693] seconds in processing]: fom=0x55a166e52000, fop 0x55a166e520b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[133:331610089] seconds in processing]: fom=0x55a166f20000, fop 0x55a166f200b8[0] phase: RFS_WAITING
[2022-06-27 11:42:49] motr[00010]: 63f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[128:607100602] seconds in processing]: fom=0x55a166e50000, fop 0x55a166e500b8[0] phase: RFS_WAITING
[2022-06-27 11:43:06] 2022-06-27T11:43:06.557+0000 7f1fd3f45700 -1 rgw dbstore: Initialization timeout, failed to initialize
[root@ssc-vm-g2-rhev4-1630 ~]#
Degraded IO failed with error. Failed one of the data pods by using replicas: kubectl scale deploy cortx-data-ssc-vm-rhev4-2450 --replicas 0
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 ls --endpoint-url http://$IP:$Port
2022-06-27 05:00:18 test
2022-06-27 05:00:25 test2
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp sanityIO1gb s3://test2/sanityIO1gb --endpoint-url http://$IP:$Port
upload failed: ./sanityIO1gb to s3://test2/sanityIO1gb An error occurred (UnknownError) when calling the CreateMultipartUpload operation (reached max retries: 4): Unknown
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp sanityIO1gb s3://test2/sanityIO1gb1 --endpoint-url http://$IP:$Port
upload failed: ./sanityIO1gb to s3://test2/sanityIO1gb1 An error occurred (UnknownError) when calling the CreateMultipartUpload operation (reached max retries: 4): Unknown
cc. @mssawant, @pavankrishnat, @vaibhavparatwar
In degraded State, Ran S3bench cmd for testing of continuous read/write IOs. Found panic for that. Seen data pod restarts for that.
[root@ssc-vm-g4-rhev4-1587 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cortx-consul-client-49j9c 1/1 Running 0 6h31m
cortx-consul-client-5znhr 1/1 Running 0 6h30m
cortx-consul-client-knb52 1/1 Running 0 6h30m
cortx-consul-client-n4fkc 1/1 Running 0 6h31m
cortx-consul-client-pkttk 1/1 Running 0 6h32m
cortx-consul-server-0 1/1 Running 0 6h30m
cortx-consul-server-1 1/1 Running 0 6h31m
cortx-consul-server-2 1/1 Running 0 6h32m
cortx-control-646746fdf4-54zcj 1/1 Running 0 6h29m
cortx-data-ssc-vm-g4-rhev4-1588-79c7dd59fb-kmx2n 4/4 Running 1 (95m ago) 6h28m
cortx-data-ssc-vm-g4-rhev4-1589-d99d5899-5d9lt 4/4 Running 0 6h28m
cortx-data-ssc-vm-rhev4-2450-668b7fbccc-mfd2k 4/4 Running 1 (70m ago) 101m
cortx-data-ssc-vm-rhev4-2451-5cccc45c78-wp4zv 4/4 Running 1 (66m ago) 6h28m
cortx-data-ssc-vm-rhev4-2635-6f8575ddcd-kcs8p 4/4 Running 1 (86m ago) 6h28m
cortx-ha-6bc5b9557-lqhpm 3/3 Running 0 6h24m
cortx-kafka-0 1/1 Running 0 6h34m
cortx-kafka-1 1/1 Running 0 6h34m
cortx-kafka-2 1/1 Running 0 6h34m
cortx-server-ssc-vm-g4-rhev4-1588-59fcb59654-6wtkp 2/2 Running 0 6h26m
cortx-server-ssc-vm-g4-rhev4-1589-b57677c44-8b45x 2/2 Running 0 6h26m
cortx-server-ssc-vm-rhev4-2450-7fbf87b59b-w4c24 2/2 Running 0 6h26m
cortx-server-ssc-vm-rhev4-2451-6c8cbd687d-qbv2g 2/2 Running 0 6h26m
cortx-server-ssc-vm-rhev4-2635-f678545db-lnpxg 2/2 Running 0 6h26m
cortx-zookeeper-0 1/1 Running 0 6h34m
cortx-zookeeper-1 1/1 Running 0 6h34m
cortx-zookeeper-2 1/1 Running 0 6h34m
[root@ssc-vm-g4-rhev4-1587 ~]# ./s3bench.2020-04-09 -accessKey sgiamadmin -accessSecret ldapadmin -bucket test-bucket1 -endpoint http://10.96.157.21:8081 -numClients 1 -numSamples 1 -objectNamePrefix=s3workload -objectSize 1Mb -verbose > /root/s3bench_1Mb_50Ksamples.log -region us-east-1
panic: Failed to create bucket: RequestError: send request failed
caused by: Put http://10.96.157.21:8081/test-bucket1: dial tcp 10.96.157.21:8081: i/o timeout
goroutine 1 [running]:
main.(*Params).prepareBucket(0xc0001cadd0, 0xc00030a000, 0x8a6dc9)
/home/720554/proj/mero_s3bench/s3bench/s3bench.go:54 +0x32f
main.main()
/home/720554/proj/mero_s3bench/s3bench/s3bench.go:179 +0xfa5
In hare-hax logs found motr panic:
motr[00097]: 7c40 ERROR [spiel/cmd.c:2126:spiel_proc_counter_item_rlink_cb] connect failed
2022-06-28 10:01:59,682 [ERROR] {byte-count-updater} Failed due to Bytecount stats unavailable. Aborting this iteration. Waiting for next attempt.
Traceback (most recent call last):
File "/opt/seagate/cortx/hare/lib64/python3.6/site-packages/hax/bytecount.py", line 141, in _execute
motr.get_proc_bytecount(ios)
File "/opt/seagate/cortx/hare/lib64/python3.6/site-packages/hax/motr/__init__.py", line 754, in get_proc_bytecount
raise BytecountException('Bytecount stats unavailable')
hax.exception.BytecountException
motr[00097]: fbc0 FATAL [lib/assert.c:50:m0_panic] panic: fatal signal delivered at unknown() (unknown:0) [git: 2.0.0-837-7-g03498588] /etc/cortx/hare/config/671459a81222ef684c19108fc9b89516/m0trace.97.2022-06-28-05:10:37
Motr panic: fatal signal delivered at unknown() unknown:0 (errno: 111) (last failed: none) [git: 2.0.0-837-7-g03498588] pid: 97 /etc/cortx/hare/config/671459a81222ef684c19108fc9b89516/m0trace.97.2022-06-28-05:10:37
Motr panic reason: signo: 11
/lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7fd4a22b5573]
/lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7fd4a22b5749]
/lib64/libmotr.so.2(m0_panic+0x13d)[0x7fd4a22a414d]
/lib64/libmotr.so.2(+0x3a079c)[0x7fd4a22b579c]
/lib64/libpthread.so.0(+0x12b30)[0x7fd4aab2cb30]
/lib64/libmotr.so.2(m0_tlist_next+0xc)[0x7fd4a22abfac]
/lib64/libmotr.so.2(+0x424c1e)[0x7fd4a2339c1e]
/lib64/libmotr.so.2(m0_rpc_frm_enq_item+0x2f0)[0x7fd4a233a7b0]
/lib64/libmotr.so.2(m0_rpc_item_send+0x13c)[0x7fd4a233f84c]
/lib64/libmotr.so.2(+0x42ba1e)[0x7fd4a2340a1e]
/lib64/libmotr.so.2(m0_sm_asts_run+0x131)[0x7fd4a234cab1]
/lib64/libmotr.so.2(m0_rpc_machine_lock+0x43)[0x7fd4a2345bb3]
/lib64/libmotr.so.2(rpc_worker_thread_fn+0x69)[0x7fd4a2346149]
/lib64/libmotr.so.2(m0_thread_trampoline+0x5e)[0x7fd4a22aacde]
/lib64/libmotr.so.2(+0x3a1431)[0x7fd4a22b6431]
/lib64/libpthread.so.0(+0x815a)[0x7fd4aab2215a]
/lib64/libc.so.6(clone+0x43)[0x7fd4aa0c7dd3]
2022-06-28 05:10:27,803 - executing command /usr/libexec/cortx-motr/motr-start m0d-0x7200000000000001:0x3
2022-06-28 05:10:27,873 - MOTR_M0D_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1588@22002
2022-06-28 05:10:27,875 - MOTR_PROCESS_FID: 0x7200000000000001:0x3
2022-06-28 05:10:27,875 - MOTR_HA_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1588@22001
2022-06-28 05:10:27,875 - MOTR_M0D_DATA_DIR: /etc/cortx/motr
2022-06-28 05:10:27,875 - MOTR_CONF_XC: /etc/motr/confd.xc
2022-06-28 05:10:27,899 - motr transport : libfab
2022-06-28 05:10:27,916 - Service FID: m0d-0x7200000000000001:0x3
2022-06-28 05:10:27,951 - BE log size is not configured
2022-06-28 05:10:27,951 - + exec /usr/bin/m0d -e libfab:inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1588@22002 -A linuxstob:/etc/cortx/log/motr/671459a81222ef684c19108fc9b89516/addb/m0d-0x7200000000000001:0x3/addb-stobs -f '<0x7200000000000001:0x3>' -T linux -S stobs -D db -m 524288 -q 64 -E 32 -J 64 -c /etc/motr/confd.xc -H inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1588@22001 -U -r 134217728
2022-06-28 05:10:32,908 - motr[00036]: da60 WARN [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick] rlk_rc=-110
2022-06-28 05:10:36,913 - motr[00036]: da60 WARN [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick] rlk_rc=-110
2022-06-28 05:11:16,604 - motr[00036]: b190 ERROR [conf/helpers.c:552:m0_conf_process2service_get] <! rc=-2
2022-06-28 05:11:16,605 - Started
2022-06-28 05:11:16,605 - m0d: systemd notifications not allowed
2022-06-28 05:11:16,605 -
2022-06-28 05:11:16,605 - Press CTRL+C to quit.
2022-06-28 09:39:49,146 - motr[00036]: d2a0 ERROR [pool/pool_machine.c:783:m0_poolmach_state_transit] <7600000000000001:0>: nr_failures:3 max_failures:2 event_index:6 event_state:3
2022-06-28 09:39:49,148 - motr[00036]: d2a0 ERROR [pool/pool_machine.c:783:m0_poolmach_state_transit] <7600000000000001:0>: nr_failures:4 max_failures:2 event_index:7 event_state:3
2022-06-28 09:56:51,777 - motr[00036]: d2a0 ERROR [pool/pool_machine.c:783:m0_poolmach_state_transit] <7600000000000001:0>: nr_failures:3 max_failures:2 event_index:4 event_state:1
2022-06-28 09:56:51,869 - motr[00036]: af40 ERROR [net/ip.c:452:m0_net_hostname_to_ip] gethostbyname err=1 for 172-16-18-246.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
2022-06-28 09:56:51,869 - motr[00036]: af40 ERROR [net/ip.c:454:m0_net_hostname_to_ip] <! rc=1
2022-06-28 09:56:51,869 - motr[00036]: b0e0 ERROR [net/libfab/libfab.c:2261:libfab_dns_resolve_retry] gethostbyname() failed with err 1 for 172-16-18-246.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local@21001```
cc. @mssawant , @vaibhavparatwar, @pavankrishnat
retest this please
Tested 6N deployment, regular IO, degraded IO with node failure, degraded IO with ioservice restart,
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cortx-consul-client-4fd95 1/1 Running 0 11m
cortx-consul-client-68kkv 1/1 Running 0 10m
cortx-consul-client-7t629 1/1 Running 0 11m
cortx-consul-client-dt67m 1/1 Running 0 11m
cortx-consul-client-vhcqs 1/1 Running 0 11m
cortx-consul-client-zp57r 1/1 Running 0 11m
cortx-consul-server-0 1/1 Running 0 9m34s
cortx-consul-server-1 1/1 Running 0 10m
cortx-consul-server-2 1/1 Running 0 11m
cortx-control-6555bcd848-s4h8g 1/1 Running 0 9m2s
cortx-data-ssc-vm-g2-rhev4-1630-856bb78668-7kbjx 4/4 Running 0 7m57s
cortx-data-ssc-vm-g2-rhev4-1631-84b56d8955-6n4qg 4/4 Running 0 7m56s
cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp 4/4 Running 0 7m55s
cortx-data-ssc-vm-g2-rhev4-1635-7bb4cc8b75-n5dpg 4/4 Running 0 7m54s
cortx-data-ssc-vm-g2-rhev4-2237-86787d97f8-2rs5b 4/4 Running 0 7m53s
cortx-data-ssc-vm-g2-rhev4-2238-777c6f78cf-md6ph 4/4 Running 0 7m52s
cortx-ha-5769c7f7cc-zcg4m 3/3 Running 0 4m16s
cortx-kafka-0 1/1 Running 0 13m
cortx-kafka-1 1/1 Running 0 13m
cortx-kafka-2 1/1 Running 0 13m
cortx-server-ssc-vm-g2-rhev4-1630-5fd67bb9b8-t46m4 2/2 Running 0 6m13s
cortx-server-ssc-vm-g2-rhev4-1631-77d89d568b-gd5mc 2/2 Running 0 6m12s
cortx-server-ssc-vm-g2-rhev4-1632-5597c88b68-r58rd 2/2 Running 0 6m12s
cortx-server-ssc-vm-g2-rhev4-1635-8667f864b6-t65gt 2/2 Running 0 6m11s
cortx-server-ssc-vm-g2-rhev4-2237-8b4445548-vvv94 2/2 Running 0 6m10s
cortx-server-ssc-vm-g2-rhev4-2238-5d6f9f6d9d-v9tc7 2/2 Running 0 6m9s
cortx-zookeeper-0 1/1 Running 0 13m
cortx-zookeeper-1 1/1 Running 0 13m
cortx-zookeeper-2 1/1 Running 0 13m
[root@ssc-vm-g2-rhev4-1630 ~]#
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 mb s3://test
make_bucket: test
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp file_1G s3://test
upload: ./file_1G to s3://test/file_1G
[root@ssc-vm-g2-rhev4-1630 ~]#
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
cortx-control 1/1 1 1 13m
cortx-data-ssc-vm-g2-rhev4-1630 1/1 1 1 11m
cortx-data-ssc-vm-g2-rhev4-1631 1/1 1 1 11m
cortx-data-ssc-vm-g2-rhev4-1632 1/1 1 1 11m
cortx-data-ssc-vm-g2-rhev4-1635 1/1 1 1 11m
cortx-data-ssc-vm-g2-rhev4-2237 1/1 1 1 11m
cortx-data-ssc-vm-g2-rhev4-2238 1/1 1 1 11m
cortx-ha 1/1 1 1 8m16s
cortx-server-ssc-vm-g2-rhev4-1630 1/1 1 1 10m
cortx-server-ssc-vm-g2-rhev4-1631 1/1 1 1 10m
cortx-server-ssc-vm-g2-rhev4-1632 1/1 1 1 10m
cortx-server-ssc-vm-g2-rhev4-1635 1/1 1 1 10m
cortx-server-ssc-vm-g2-rhev4-2237 1/1 1 1 10m
cortx-server-ssc-vm-g2-rhev4-2238 1/1 1 1 10m
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get deployment cortx-data-ssc-vm-g2-rhev4-1635 -o yaml > cortx-data-ssc-vm-g2-rhev4-1635.yaml
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl delete deployment cortx-data-ssc-vm-g2-rhev4-1635
deployment.apps "cortx-data-ssc-vm-g2-rhev4-1635" deleted
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl exec -it cortx-data-ssc-vm-g2-rhev4-1630-856bb78668-7kbjx -c cortx-hax -- /bin/bash
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1630 /]#
# Degraded write
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-1630 /]# consul kv get -recurse processes | grep STOPPED
processes/0x7200000000000001:0xc:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
processes/0x7200000000000001:0xd:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0xe:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0xf:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-1630 /]# hctl status -d | grep offline
[offline] hax 0x7200000000000001:0xc inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1635@22001
[offline] ioservice 0x7200000000000001:0xd inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1635@21001
[offline] ioservice 0x7200000000000001:0xe inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1635@21002
[offline] confd 0x7200000000000001:0xf inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1635@22002
[offline] /dev/sdd
[offline] /dev/sde
[offline] /dev/sdc
[offline] /dev/sdg
[offline] /dev/sdh
[offline] /dev/sdf
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-1630 /]#
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp file_1G_2 s3://test
upload: ./file_1G_2 to s3://test/file_1G_2
[root@ssc-vm-g2-rhev4-1630 ~]#
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 ls s3://test
2022-06-28 10:30:59 1073741824 file_1G
2022-06-28 11:02:21 1073741824 file_1G_2
[root@ssc-vm-g2-rhev4-1630 ~]#
# Restarting failed pod
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl apply -f cortx-data-ssc-vm-g2-rhev4-1635.yaml
deployment.apps/cortx-data-ssc-vm-g2-rhev4-1635 created
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cortx-consul-client-4fd95 1/1 Running 0 51m
cortx-consul-client-68kkv 1/1 Running 0 51m
cortx-consul-client-7t629 1/1 Running 0 52m
cortx-consul-client-dt67m 1/1 Running 0 51m
cortx-consul-client-vhcqs 1/1 Running 0 52m
cortx-consul-client-zp57r 1/1 Running 0 51m
cortx-consul-server-0 1/1 Running 0 50m
cortx-consul-server-1 1/1 Running 0 51m
cortx-consul-server-2 1/1 Running 0 52m
cortx-control-6555bcd848-s4h8g 1/1 Running 0 49m
cortx-data-ssc-vm-g2-rhev4-1630-856bb78668-7kbjx 4/4 Running 0 48m
cortx-data-ssc-vm-g2-rhev4-1631-84b56d8955-6n4qg 4/4 Running 0 48m
cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp 4/4 Running 0 48m
cortx-data-ssc-vm-g2-rhev4-1635-7bb4cc8b75-vqdq2 4/4 Running 0 2m52s
cortx-data-ssc-vm-g2-rhev4-2237-86787d97f8-2rs5b 4/4 Running 0 48m
cortx-data-ssc-vm-g2-rhev4-2238-777c6f78cf-md6ph 4/4 Running 0 48m
cortx-ha-5769c7f7cc-zcg4m 3/3 Running 0 44m
cortx-kafka-0 1/1 Running 0 53m
cortx-kafka-1 1/1 Running 0 53m
cortx-kafka-2 1/1 Running 0 53m
cortx-server-ssc-vm-g2-rhev4-1630-5fd67bb9b8-t46m4 2/2 Running 0 46m
cortx-server-ssc-vm-g2-rhev4-1631-77d89d568b-gd5mc 2/2 Running 0 46m
cortx-server-ssc-vm-g2-rhev4-1632-5597c88b68-r58rd 2/2 Running 0 46m
cortx-server-ssc-vm-g2-rhev4-1635-8667f864b6-t65gt 2/2 Running 0 46m
cortx-server-ssc-vm-g2-rhev4-2237-8b4445548-vvv94 2/2 Running 0 46m
cortx-server-ssc-vm-g2-rhev4-2238-5d6f9f6d9d-v9tc7 2/2 Running 0 46m
cortx-zookeeper-0 1/1 Running 0 53m
cortx-zookeeper-1 1/1 Running 0 53m
cortx-zookeeper-2 1/1 Running 0 53m
[root@ssc-vm-g2-rhev4-1630 ~]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1630 /]# consul kv get -recurse | grep STOPPED
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1630 /]#
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp file_1G s3://test/file_1G_3
upload: ./file_1G to s3://test/file_1G_3
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 ls s3://test
2022-06-28 10:30:59 1073741824 file_1G
2022-06-28 11:02:21 1073741824 file_1G_2
2022-06-28 11:09:04 1073741824 file_1G_3
[root@ssc-vm-g2-rhev4-1630 ~]#
# Read file written in degraded mode
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp file_1G s3://test/file_1G_3
upload: ./file_1G to s3://test/file_1G_3
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 ls s3://test
2022-06-28 10:30:59 1073741824 file_1G
2022-06-28 11:02:21 1073741824 file_1G_2
2022-06-28 11:09:04 1073741824 file_1G_3
[root@ssc-vm-g2-rhev4-1630 ~]# aws s3 cp s3://test/file_1G_2 file_1G_2_read_after_node_restart
download: s3://test/file_1G_2 to ./file_1G_2_read_after_node_restart
[root@ssc-vm-g2-rhev4-1630 ~]# diff file_1G_2 file_1G_2_read_after_node_restart
[root@ssc-vm-g2-rhev4-1630 ~]#
# Continuos IO test with process restart
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl exec -it cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp -c cortx-motr-io-001
error: you must specify at least one command for the container
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl exec -it cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp -c cortx-motr-io-001 -- /bin/bash
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1632 /]# ps -aux | grep m0d
root 36 0.0 0.0 12408 1708 ? S 16:21 0:00 /usr/bin/bash /usr/libexec/cortx-motr/motr-start m0d-0x7200000000000001:0x15
root 37 71.3 2.4 32537996 396412 ? Sl 16:21 36:35 /usr/bin/m0d -e libfab:inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@21001 -A linuxstob:/etc/cortx/log/motr/e35889b2e1f4b04e18e9e501e6368966/addb/m0d-0x7200000000000001:0x15/addb-stobs -f <0x7200000000000001:0x15> -T ad -S stobs -D db -m 524288 -q 64 -E 32 -J 64 -H inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-1632@22001 -U -B /dev/sdc -z 26843545600 -r 134217728
root 300 0.0 0.0 9204 752 pts/0 S+ 17:12 0:00 grep --color=auto m0d
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1632 /]# kill -9 37
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1632 /]# command terminated with exit code 137
[root@ssc-vm-g2-rhev4-1630 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cortx-consul-client-4fd95 1/1 Running 0 57m
cortx-consul-client-68kkv 1/1 Running 0 57m
cortx-consul-client-7t629 1/1 Running 0 58m
cortx-consul-client-dt67m 1/1 Running 0 57m
cortx-consul-client-vhcqs 1/1 Running 0 58m
cortx-consul-client-zp57r 1/1 Running 0 57m
cortx-consul-server-0 1/1 Running 0 56m
cortx-consul-server-1 1/1 Running 0 57m
cortx-consul-server-2 1/1 Running 0 58m
cortx-control-6555bcd848-s4h8g 1/1 Running 0 55m
cortx-data-ssc-vm-g2-rhev4-1630-856bb78668-7kbjx 4/4 Running 0 54m
cortx-data-ssc-vm-g2-rhev4-1631-84b56d8955-6n4qg 4/4 Running 0 54m
cortx-data-ssc-vm-g2-rhev4-1632-c4d475646-2cnbp 4/4 Running 1 (64s ago) 54m
cortx-data-ssc-vm-g2-rhev4-1635-7bb4cc8b75-vqdq2 4/4 Running 0 8m56s
cortx-data-ssc-vm-g2-rhev4-2237-86787d97f8-2rs5b 4/4 Running 0 54m
cortx-data-ssc-vm-g2-rhev4-2238-777c6f78cf-md6ph 4/4 Running 0 54m
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-1630 /]# consul kv get -recurse | grep process_restart
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x0:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x1:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x1b:"1"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x2:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1630/process_restarts/0x7200000000000001:0x3:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1631/process_restarts/0x7200000000000001:0x4:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1631/process_restarts/0x7200000000000001:0x5:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1631/process_restarts/0x7200000000000001:0x6:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1631/process_restarts/0x7200000000000001:0x7:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1632/process_restarts/0x7200000000000001:0x14:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1632/process_restarts/0x7200000000000001:0x15:"3"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1632/process_restarts/0x7200000000000001:0x16:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1632/process_restarts/0x7200000000000001:0x17:"2"
**cortx-data-headless-svc-ssc-vm-g2-rhev4-1635/process_restarts/0x7200000000000001:0xc:"3"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1635/process_restarts/0x7200000000000001:0xd:"3"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1635/process_restarts/0x7200000000000001:0xe:"3"
cortx-data-headless-svc-ssc-vm-g2-rhev4-1635/process_restarts/0x7200000000000001:0xf:"3"**
cortx-data-headless-svc-ssc-vm-g2-rhev4-2237/process_restarts/0x7200000000000001:0x10:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-2237/process_restarts/0x7200000000000001:0x11:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-2237/process_restarts/0x7200000000000001:0x12:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-2237/process_restarts/0x7200000000000001:0x13:"2"
cortx-data-headless-svc-ssc-vm-g2-rhev4-2238/process_restarts/0x7200000000000001:0x8:"2"
15N deployment with regular and degraded IO
[root@ssc-vm-g2-rhev4-3031 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cortx-consul-client-4hzrb 1/1 Running 0 24m
cortx-consul-client-5qcsx 1/1 Running 0 24m
cortx-consul-client-6lbfr 1/1 Running 0 23m
cortx-consul-client-6n48h 1/1 Running 0 24m
cortx-consul-client-7tkm6 1/1 Running 0 23m
cortx-consul-client-82q62 1/1 Running 0 24m
cortx-consul-client-98fbz 1/1 Running 0 23m
cortx-consul-client-9mnjj 1/1 Running 0 24m
cortx-consul-client-kskcz 1/1 Running 0 23m
cortx-consul-client-mrlt5 1/1 Running 0 23m
cortx-consul-client-p72h5 1/1 Running 0 24m
cortx-consul-client-qmrh4 1/1 Running 0 23m
cortx-consul-client-s9wq6 1/1 Running 0 23m
cortx-consul-client-w8hgx 1/1 Running 0 23m
cortx-consul-client-wg5bt 1/1 Running 0 23m
cortx-consul-server-0 1/1 Running 0 23m
cortx-consul-server-1 1/1 Running 0 24m
cortx-consul-server-2 1/1 Running 0 24m
cortx-control-76df54775b-jmg5n 1/1 Running 0 22m
cortx-data-ssc-vm-g2-rhev4-3031-6b599df947-j2zxx 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3156-557664b9d7-qm9lv 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3157-5b48475f6d-slv84 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3158-7cbf676885-mksfz 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3159-8557b9d6bb-q94f9 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3160-7d796bf9f8-r5cs6 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3166-98ccb94b4-hn7mj 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3167-57fc4c9799-hhntj 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3168-5d67b5d498-xlzbz 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3169-69f4798f85-g7m2v 4/4 Running 0 20m
cortx-data-ssc-vm-g2-rhev4-3170-85b4cf4cc7-bghbc 4/4 Running 0 20m
cortx-data-ssc-vm-g4-rhev4-1583-8479f977f-st2gm 4/4 Running 0 20m
cortx-data-ssc-vm-g4-rhev4-1590-74687dd45f-28f4j 4/4 Running 0 20m
cortx-data-ssc-vm-g4-rhev4-1591-97b6d94d4-h4s6f 4/4 Running 0 20m
cortx-data-ssc-vm-g4-rhev4-1592-6f75c67fbf-pnlth 4/4 Running 0 20m
cortx-ha-6c856f7d6c-m5hn4 3/3 Running 0 10m
cortx-kafka-0 1/1 Running 1 (25m ago) 26m
cortx-kafka-1 1/1 Running 1 (25m ago) 26m
cortx-kafka-2 1/1 Running 2 (25m ago) 26m
cortx-server-ssc-vm-g2-rhev4-3031-6c4ddbf8bb-wwql7 2/2 Running 0 16m
cortx-server-ssc-vm-g2-rhev4-3156-f67bbbddb-vlr8t 2/2 Running 1 (5m25s ago) 16m
cortx-server-ssc-vm-g2-rhev4-3157-5bcb46c47b-vvnqd 2/2 Running 0 16m
cortx-server-ssc-vm-g2-rhev4-3158-5689c6d966-f6nst 2/2 Running 0 16m
cortx-server-ssc-vm-g2-rhev4-3159-7f567ddc56-847m4 2/2 Running 0 16m
cortx-server-ssc-vm-g2-rhev4-3160-767789996-fblkr 2/2 Running 0 16m
cortx-server-ssc-vm-g2-rhev4-3166-87fdcb968-z9skn 2/2 Running 0 16m
cortx-server-ssc-vm-g2-rhev4-3167-659fbbb55d-f86l2 2/2 Running 1 (5m25s ago) 16m
cortx-server-ssc-vm-g2-rhev4-3168-79fd6d98d4-hsxlh 2/2 Running 0 16m
cortx-server-ssc-vm-g2-rhev4-3169-6bb7f8d54b-jxhdd 2/2 Running 0 16m
cortx-server-ssc-vm-g2-rhev4-3170-8577748546-g9r6j 2/2 Running 0 16m
cortx-server-ssc-vm-g4-rhev4-1583-8699c7cfcd-gf9n8 2/2 Running 0 16m
cortx-server-ssc-vm-g4-rhev4-1590-6585896df4-xlh7r 2/2 Running 0 16m
cortx-server-ssc-vm-g4-rhev4-1591-79fcf98c5b-v7xkc 2/2 Running 1 (5m24s ago) 16m
cortx-server-ssc-vm-g4-rhev4-1592-66fc68d58-wkn8h 2/2 Running 0 16m
cortx-zookeeper-0 1/1 Running 0 26m
cortx-zookeeper-1 1/1 Running 0 26m
cortx-zookeeper-2 1/1 Running 0 26m
[root@ssc-vm-g2-rhev4-3031 ~]# kubectl exec -it cortx-data-ssc-vm-g2-rhev4-3031-6b599df947-j2zxx -c cortx-hax -- /bin/bash
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]# consul kv get -recurse processes | egrep 'STOPPED|STARTING|STOPPING'
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]# hctl status -d
Bytecount:
critical : 0
damaged : 0
degraded : 0
healthy : 0
Data pool:
# fid name
0x6f00000000000001:0x0 'storage-set-1__sns'
Profile:
# fid name: pool(s)
0x7000000000000001:0x0 'Profile_the_pool': 'storage-set-1__sns' 'storage-set-1__dix' None
Services:
cortx-data-headless-svc-ssc-vm-g2-rhev4-3168
[started] hax 0x7200000000000001:0x0 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3168@22001
[started] ioservice 0x7200000000000001:0x1 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3168@21001
[started] ioservice 0x7200000000000001:0x2 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3168@21002
[started] confd 0x7200000000000001:0x3 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3168@22002
cortx-data-headless-svc-ssc-vm-g4-rhev4-1590
[started] hax 0x7200000000000001:0x4 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1590@22001
[started] ioservice 0x7200000000000001:0x5 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1590@21001
[started] ioservice 0x7200000000000001:0x6 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1590@21002
[started] confd 0x7200000000000001:0x7 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1590@22002
cortx-server-headless-svc-ssc-vm-g2-rhev4-3031
[started] hax 0x7200000000000001:0x3e inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3031@22001
[started] rgw_s3 0x7200000000000001:0x3f inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3031@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3159
[started] hax 0x7200000000000001:0x40 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3159@22001
[started] rgw_s3 0x7200000000000001:0x41 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3159@21001
cortx-server-headless-svc-ssc-vm-g4-rhev4-1591
[started] hax 0x7200000000000001:0x42 inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1591@22001
[started] rgw_s3 0x7200000000000001:0x43 inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1591@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3168
[started] hax 0x7200000000000001:0x44 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3168@22001
[started] rgw_s3 0x7200000000000001:0x45 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3168@21001
cortx-server-headless-svc-ssc-vm-g4-rhev4-1592
[started] hax 0x7200000000000001:0x46 inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1592@22001
[started] rgw_s3 0x7200000000000001:0x47 inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1592@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3156
[started] hax 0x7200000000000001:0x48 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3156@22001
[started] rgw_s3 0x7200000000000001:0x49 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3156@21001
cortx-server-headless-svc-ssc-vm-g4-rhev4-1590
[started] hax 0x7200000000000001:0x4a inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1590@22001
[started] rgw_s3 0x7200000000000001:0x4b inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1590@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3167
[started] hax 0x7200000000000001:0x4c inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3167@22001
[started] rgw_s3 0x7200000000000001:0x4d inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3167@21001
cortx-server-headless-svc-ssc-vm-g4-rhev4-1583
[started] hax 0x7200000000000001:0x4e inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1583@22001
[started] rgw_s3 0x7200000000000001:0x4f inet:tcp:cortx-server-headless-svc-ssc-vm-g4-rhev4-1583@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3170
[started] hax 0x7200000000000001:0x50 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3170@22001
[started] rgw_s3 0x7200000000000001:0x51 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3170@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3160
[started] hax 0x7200000000000001:0x52 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3160@22001
[started] rgw_s3 0x7200000000000001:0x53 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3160@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3157
[started] hax 0x7200000000000001:0x54 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3157@22001
[started] rgw_s3 0x7200000000000001:0x55 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3157@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3166
[started] hax 0x7200000000000001:0x56 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3166@22001
[started] rgw_s3 0x7200000000000001:0x57 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3166@21001
cortx-server-headless-svc-ssc-vm-g2-rhev4-3169
[started] hax 0x7200000000000001:0x58 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3169@22001
[started] rgw_s3 0x7200000000000001:0x59 inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3169@21001
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156
[started] hax 0x7200000000000001:0x8 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3156@22001
[started] ioservice 0x7200000000000001:0x9 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3156@21001
[started] ioservice 0x7200000000000001:0xa inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3156@21002
[started] confd 0x7200000000000001:0xb inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3156@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3160
[started] hax 0x7200000000000001:0xc inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3160@22001
[started] ioservice 0x7200000000000001:0xd inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3160@21001
[started] ioservice 0x7200000000000001:0xe inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3160@21002
[started] confd 0x7200000000000001:0xf inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3160@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3159
[started] hax 0x7200000000000001:0x10 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3159@22001
[started] ioservice 0x7200000000000001:0x11 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3159@21001
[started] ioservice 0x7200000000000001:0x12 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3159@21002
[started] confd 0x7200000000000001:0x13 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3159@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031
[started] hax 0x7200000000000001:0x14 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3031@22001
[started] ioservice 0x7200000000000001:0x15 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3031@21001
[started] ioservice 0x7200000000000001:0x16 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3031@21002
[started] confd 0x7200000000000001:0x17 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3031@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3158
[started] hax 0x7200000000000001:0x18 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3158@22001
[started] ioservice 0x7200000000000001:0x19 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3158@21001
[started] ioservice 0x7200000000000001:0x1a inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3158@21002
[started] confd 0x7200000000000001:0x1b inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3158@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169
[started] hax 0x7200000000000001:0x1c inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3169@22001
[started] ioservice 0x7200000000000001:0x1d inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3169@21001
[started] ioservice 0x7200000000000001:0x1e inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3169@21002
[started] confd 0x7200000000000001:0x1f inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3169@22002
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591 (RC)
[started] hax 0x7200000000000001:0x20 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1591@22001
[started] ioservice 0x7200000000000001:0x21 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1591@21001
[started] ioservice 0x7200000000000001:0x22 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1591@21002
[started] confd 0x7200000000000001:0x23 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1591@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170
[started] hax 0x7200000000000001:0x24 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3170@22001
[started] ioservice 0x7200000000000001:0x25 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3170@21001
[started] ioservice 0x7200000000000001:0x26 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3170@21002
[started] confd 0x7200000000000001:0x27 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3170@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167
[started] hax 0x7200000000000001:0x28 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3167@22001
[started] ioservice 0x7200000000000001:0x29 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3167@21001
[started] ioservice 0x7200000000000001:0x2a inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3167@21002
[started] confd 0x7200000000000001:0x2b inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3167@22002
cortx-data-headless-svc-ssc-vm-g4-rhev4-1583
[started] hax 0x7200000000000001:0x2c inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1583@22001
[started] ioservice 0x7200000000000001:0x2d inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1583@21001
[started] ioservice 0x7200000000000001:0x2e inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1583@21002
[started] confd 0x7200000000000001:0x2f inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1583@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3157
[started] hax 0x7200000000000001:0x30 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3157@22001
[started] ioservice 0x7200000000000001:0x31 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3157@21001
[started] ioservice 0x7200000000000001:0x32 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3157@21002
[started] confd 0x7200000000000001:0x33 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3157@22002
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166
[started] hax 0x7200000000000001:0x34 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3166@22001
[started] ioservice 0x7200000000000001:0x35 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3166@21001
[started] ioservice 0x7200000000000001:0x36 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3166@21002
[started] confd 0x7200000000000001:0x37 inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-3166@22002
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592
[started] hax 0x7200000000000001:0x38 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1592@22001
[started] ioservice 0x7200000000000001:0x39 inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1592@21001
[started] ioservice 0x7200000000000001:0x3a inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1592@21002
[started] confd 0x7200000000000001:0x3b inet:tcp:cortx-data-headless-svc-ssc-vm-g4-rhev4-1592@22002
cortx-server-headless-svc-ssc-vm-g2-rhev4-3158
[started] hax 0x7200000000000001:0x3c inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3158@22001
[started] rgw_s3 0x7200000000000001:0x3d inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-3158@21001
Devices:
cortx-data-headless-svc-ssc-vm-g2-rhev4-3168
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdf
[online] /dev/sdg
[online] /dev/sdh
cortx-data-headless-svc-ssc-vm-g4-rhev4-1590
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-server-headless-svc-ssc-vm-g2-rhev4-3031
cortx-server-headless-svc-ssc-vm-g2-rhev4-3159
cortx-server-headless-svc-ssc-vm-g4-rhev4-1591
cortx-server-headless-svc-ssc-vm-g2-rhev4-3168
cortx-server-headless-svc-ssc-vm-g4-rhev4-1592
cortx-server-headless-svc-ssc-vm-g2-rhev4-3156
cortx-server-headless-svc-ssc-vm-g4-rhev4-1590
cortx-server-headless-svc-ssc-vm-g2-rhev4-3167
cortx-server-headless-svc-ssc-vm-g4-rhev4-1583
cortx-server-headless-svc-ssc-vm-g2-rhev4-3170
cortx-server-headless-svc-ssc-vm-g2-rhev4-3160
cortx-server-headless-svc-ssc-vm-g2-rhev4-3157
cortx-server-headless-svc-ssc-vm-g2-rhev4-3166
cortx-server-headless-svc-ssc-vm-g2-rhev4-3169
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3160
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3159
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3158
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g4-rhev4-1583
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3157
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592
[online] /dev/sdd
[online] /dev/sde
[online] /dev/sdc
[online] /dev/sdg
[online] /dev/sdh
[online] /dev/sdf
cortx-server-headless-svc-ssc-vm-g2-rhev4-3158
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]# hctl status -d | egrep 'offline|recovering|UNKNOWN|unknown'
[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 cp 1G s3://test
upload: ./1G to s3://test/1G
[root@ssc-vm-g2-rhev4-3031 ~]#
# Degraded write
[root@ssc-vm-g2-rhev4-3031 ~]# kubectl get deployment cortx-data-ssc-vm-g2-rhev4-3159 -o yaml > cortx-data-ssc-vm-g2-rhev4-3159.yaml
[root@ssc-vm-g2-rhev4-3031 ~]# kubectl delete deployment cortx-data-ssc-vm-g2-rhev4-3159
deployment.apps "cortx-data-ssc-vm-g2-rhev4-3159" deleted
[root@ssc-vm-g2-rhev4-3031 ~]#
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-3031 /]# consul kv get -recurse | grep STOPPED
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0x8:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-data-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3031/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3156/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3157/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3158/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3159/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3159/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3159/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3160/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3166/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3167/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3168/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3169/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g2-rhev4-3170/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1583/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1590/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1591/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
cortx-server-headless-svc-ssc-vm-g4-rhev4-1592/processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0x9:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0xa:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0xb:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-3031 /]# exit
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 ls
2022-06-28 17:51:24 test
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 ls s3://test
2022-06-28 17:52:54 1073741824 1G
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 cp 1G s3://test/1G_2
upload: ./1G to s3://test/1G_2
[root@ssc-vm-g2-rhev4-3031 ~]#
# Read after restart
[root@ssc-vm-g2-rhev4-3031 ~]# kubectl apply -f cortx-data-ssc-vm-g2-rhev4-3159.yaml
deployment.apps/cortx-data-ssc-vm-g2-rhev4-3159 created
[root@ssc-vm-g2-rhev4-3031 ~]#
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-3031 /]# consul kv get -recurse | grep STOPPED
[root@cortx-server-headless-svc-ssc-vm-g2-rhev4-3031 /]#
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 ls s3://test
2022-06-28 17:52:54 1073741824 1G
2022-06-28 18:03:05 1073741824 1G_2
[root@ssc-vm-g2-rhev4-3031 ~]# aws s3 cp s3://test/1G_2 ./
download: s3://test/1G_2 to ./1G_2
[root@ssc-vm-g2-rhev4-3031 ~]#
[root@ssc-vm-g2-rhev4-3031 ~]# diff 1G 1G_2
[root@ssc-vm-g2-rhev4-3031 ~]#
Tested 6N deployment without dtm enabled, Ref. Custom build at https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/6991 Successfully done deployment with SNS: 4+2+0 and DIX: 1+4+0 Config. Manually Tested IOs. Happy path IO's worked fine. Degraded Write works fine.
[root@ssc-vm-g4-rhev4-1587 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cortx-consul-client-5clwp 1/1 Running 0 19m
cortx-consul-client-5k9hh 1/1 Running 0 19m
cortx-consul-client-cjbzx 1/1 Running 0 18m
cortx-consul-client-z47xc 1/1 Running 0 19m
cortx-consul-client-zzks4 1/1 Running 0 18m
cortx-consul-server-0 1/1 Running 0 18m
cortx-consul-server-1 1/1 Running 0 18m
cortx-consul-server-2 1/1 Running 0 19m
cortx-control-68f7dbd6bd-rxlqp 1/1 Running 0 16m
cortx-data-ssc-vm-g4-rhev4-1588-76dfcb5848-znr6l 4/4 Running 0 15m
cortx-data-ssc-vm-g4-rhev4-1589-589b684995-slv4c 4/4 Running 0 15m
cortx-data-ssc-vm-rhev4-2450-6896d75df9-w7qss 4/4 Running 0 15m
cortx-data-ssc-vm-rhev4-2451-795986bb96-dznqr 4/4 Running 0 15m
cortx-data-ssc-vm-rhev4-2635-df564987b-78wg8 4/4 Running 0 15m
cortx-ha-d9ff49645-rvz9x 3/3 Running 0 10m
cortx-kafka-0 1/1 Running 0 21m
cortx-kafka-1 1/1 Running 1 (20m ago) 21m
cortx-kafka-2 1/1 Running 0 21m
cortx-server-ssc-vm-g4-rhev4-1588-64d958584f-45b6r 2/2 Running 0 13m
cortx-server-ssc-vm-g4-rhev4-1589-7f54f49d6c-vm9tn 2/2 Running 0 12m
cortx-server-ssc-vm-rhev4-2450-8d69fb6b7-mnl9j 2/2 Running 0 12m
cortx-server-ssc-vm-rhev4-2451-794df4997c-nwzvs 2/2 Running 0 12m
cortx-server-ssc-vm-rhev4-2635-5dbf9c9665-4hc6l 2/2 Running 0 12m
cortx-zookeeper-0 1/1 Running 0 21m
cortx-zookeeper-1 1/1 Running 0 21m
cortx-zookeeper-2 1/1 Running 0 21m
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 mb s3://test --endpoint-url http://$IP:$Port
make_bucket: test
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp file_1gb s3://test/file_1gb --endpoint-url http://$IP:$Port
upload: ./file_1gb to s3://test/file_1gb
###Degreaded Write:
[root@ssc-vm-g4-rhev4-1587 ~]# kubectl scale deploy cortx-data-ssc-vm-rhev4-2450 --replicas 0
deployment.apps/cortx-data-ssc-vm-rhev4-2450 scaled
[root@cortx-data-headless-svc-ssc-vm-g4-rhev4-1588 9a5a467bbc2a4aa2ef3b12142e1598cb]# consul kv get -recurse processes | grep STOPPED
processes/0x7200000000000001:0x4:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_HA"}
processes/0x7200000000000001:0x5:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0x6:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
processes/0x7200000000000001:0x7:{"state": "M0_CONF_HA_PROCESS_STOPPED", "type": "M0_CONF_HA_PROCESS_M0D"}
[root@cortx-data-headless-svc-ssc-vm-g4-rhev4-1588 9a5a467bbc2a4aa2ef3b12142e1598cb]# hctl status -d | grep offline
[offline] hax 0x7200000000000001:0x4 inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2450@22001
[offline] ioservice 0x7200000000000001:0x5 inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2450@21001
[offline] ioservice 0x7200000000000001:0x6 inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2450@21002
[offline] confd 0x7200000000000001:0x7 inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2450@22002
[offline] /dev/sdd
[offline] /dev/sde
[offline] /dev/sdc
[offline] /dev/sdg
[offline] /dev/sdh
[offline] /dev/sdf
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 ls s3://test/ --endpoint-url http://$IP:$Port
2022-06-29 03:34:53 1048576000 file_1gb
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp file_1gb s3://test/file_1gb_2 --endpoint-url http://$IP:$Port
upload: ./file_1gb to s3://test/file_1gb_2
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 ls s3://test/ --endpoint-url http://$IP:$Port
2022-06-29 03:34:53 1048576000 file_1gb
2022-06-29 03:46:58 1048576000 file_1gb_2
###Restarting failed pod:
[root@ssc-vm-g4-rhev4-1587 ~]# kubectl scale deploy cortx-data-ssc-vm-rhev4-2450 --replicas 1
deployment.apps/cortx-data-ssc-vm-rhev4-2450 scaled
[root@cortx-data-headless-svc-ssc-vm-g4-rhev4-1588 9a5a467bbc2a4aa2ef3b12142e1598cb]# hctl status -d | grep offline
[root@cortx-data-headless-svc-ssc-vm-g4-rhev4-1588 9a5a467bbc2a4aa2ef3b12142e1598cb]# consul kv get -recurse processes | grep STOPPED
###**Tried to write new data objects:**
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 cp file_1gb s3://test/file_1gb_3 --endpoint-url http://$IP:$Port
upload failed: ./file_1gb to s3://test/file_1gb_3 Read timeout on endpoint URL: "http://10.102.81.163:80/test/file_1gb_3?uploads"
[root@ssc-vm-g4-rhev4-1587 ~]# aws s3 ls s3://test/ --endpoint-url http://$IP:$Port
Read timeout on endpoint URL: "http://10.102.81.163:80/test?list-type=2&prefix=&delimiter=%2F&encoding-type=url"
Not able to write new data objects after node restarts. and also not able to read data written in degraded mode.
[root@ssc-vm-g4-rhev4-1587 ~]# kubectl logs cortx-data-ssc-vm-rhev4-2635-df564987b-78wg8 --all-containers
2022-06-29 10:08:30,469 [INFO] Starting Hare services
2022-06-29 10:08:30,530 [INFO] Entering logrotate_generic at line 161 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,531 [INFO] Entering get_log_dir at line 696 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,627 [INFO] Leaving get_log_dir
2022-06-29 10:08:30,628 [INFO] Leaving logrotate_generic
2022-06-29 10:08:30,635 [INFO] Entering start_hax_and_consul_without_systemd at line 414 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,636 [INFO] Entering get_config_dir at line 704 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,710 [INFO] Leaving get_config_dir
2022-06-29 10:08:30,710 [INFO] Entering get_log_dir at line 696 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,790 [INFO] Leaving get_log_dir
2022-06-29 10:08:30,791 [INFO] Entering _start_consul at line 240 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:30,858 [INFO] Entering Utils.get_local_hostname at line 91 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/utils.py
2022-06-29 10:08:30,859 [INFO] Entering Utils.get_hostname at line 79 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/utils.py
2022-06-29 10:08:30,859 [INFO] Leaving Utils.get_hostname
2022-06-29 10:08:30,859 [INFO] Leaving Utils.get_local_hostname
2022-06-29 10:08:30,860 [INFO] Entering ConsulStarter._execute at line 65 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/consul_starter.py
2022-06-29 10:08:35,919 [INFO] Leaving _start_consul
2022-06-29 10:08:35,919 [INFO] Entering _start_hax at line 298 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/main.py
2022-06-29 10:08:35,920 [INFO] Entering HaxStarter._execute at line 54 in file /opt/seagate/cortx/hare/lib/python3.6/site-packages/hare_mp/hax_starter.py
2022-06-29 10:08:35,920 [INFO] Leaving _start_hax
2022-06-29 08:57:59,174 - executing command /usr/libexec/cortx-motr/motr-start m0d-0x7200000000000001:0x13
2022-06-29 08:57:59,248 - MOTR_M0D_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2635@22002
2022-06-29 08:57:59,249 - MOTR_PROCESS_FID: 0x7200000000000001:0x13
2022-06-29 08:57:59,249 - MOTR_HA_EP: inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2635@22001
2022-06-29 08:57:59,249 - MOTR_M0D_DATA_DIR: /etc/cortx/motr
2022-06-29 08:57:59,249 - MOTR_CONF_XC: /etc/motr/confd.xc
2022-06-29 08:57:59,266 - motr transport : libfab
2022-06-29 08:57:59,276 - Service FID: m0d-0x7200000000000001:0x13
2022-06-29 08:57:59,281 - BE log size is not configured
2022-06-29 08:57:59,281 - + exec /usr/bin/m0d -e libfab:inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2635@22002 -A linuxstob:/etc/cortx/log/motr/cc3999bb13ca76034b4dfca9adfa7f90/addb/m0d-0x7200000000000001:0x13/addb-stobs -f '<0x7200000000000001:0x13>' -T linux -S stobs -D db -m 524288 -q 64 -E 32 -J 64 -c /etc/motr/confd.xc -H inet:tcp:cortx-data-headless-svc-ssc-vm-rhev4-2635@22001 -U -r 134217728
2022-06-29 08:58:03,783 - motr[00036]: ba60 WARN [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick] rlk_rc=-110
2022-06-29 08:58:07,785 - motr[00036]: ba60 WARN [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick] rlk_rc=-110
2022-06-29 08:58:11,786 - motr[00036]: ba60 WARN [ha/entrypoint.c:563:ha_entrypoint_client_fom_tick] rlk_rc=-110
2022-06-29 08:58:36,474 - motr[00036]: bb90 ERROR [conf/helpers.c:552:m0_conf_process2service_get] <! rc=-2
2022-06-29 08:58:36,475 - Started
2022-06-29 08:58:36,475 - m0d: systemd notifications not allowed
2022-06-29 08:58:36,475 -
2022-06-29 08:58:36,475 - Press CTRL+C to quit.
2022-06-29 09:50:35,565 - motr[00036]: cf40 ERROR [net/ip.c:452:m0_net_hostname_to_ip] gethostbyname err=1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
2022-06-29 09:50:35,565 - motr[00036]: cf40 ERROR [net/ip.c:454:m0_net_hostname_to_ip] <! rc=1
2022-06-29 09:50:35,566 - motr[00036]: d0e0 ERROR [net/libfab/libfab.c:2261:libfab_dns_resolve_retry] gethostbyname() failed with err 1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local@21002
2022-06-29 09:52:42,844 - motr[00036]: d3c0 ERROR [rpc/frmops.c:576:item_fail] packet 0x7ff048067230, item 0x7ff0480639b0[36] failed with ri_error=-110
2022-06-29 09:52:42,861 - motr[00036]: cf10 ERROR [net/ip.c:452:m0_net_hostname_to_ip] gethostbyname err=1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
2022-06-29 09:52:42,861 - motr[00036]: cf10 ERROR [net/ip.c:454:m0_net_hostname_to_ip] <! rc=1
2022-06-29 09:52:42,861 - motr[00036]: d0b0 ERROR [net/libfab/libfab.c:2261:libfab_dns_resolve_retry] gethostbyname() failed with err 1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local@21002
2022-06-29 09:54:50,082 - motr[00036]: d390 ERROR [rpc/frmops.c:576:item_fail] packet 0x7ff048067230, item 0x7ff0480639b0[38] failed with ri_error=-110
2022-06-29 09:54:50,083 - motr[00036]: da20 ERROR [rpc/link.c:154:rpc_link_conn_terminate] Connection termination failed (rlink=0x55aa1c96fd80)
2022-06-29 09:54:50,083 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528641953] seconds in processing]: fom=0x55aa1c977318, fop 0x55aa1c9773d0[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528654010] seconds in processing]: fom=0x55aa1c99a7c8, fop 0x55aa1c99a880[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528656331] seconds in processing]: fom=0x55aa1c969138, fop 0x55aa1c9691f0[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528658287] seconds in processing]: fom=0x55aa1c9a18b8, fop 0x55aa1c9a1970[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528660188] seconds in processing]: fom=0x55aa1c97e408, fop 0x55aa1c97e4c0[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528661808] seconds in processing]: fom=0x55aa1c9854f8, fop 0x55aa1c9855b0[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528663396] seconds in processing]: fom=0x55aa1c9936d8, fop 0x55aa1c993790[0] phase: Initialised
2022-06-29 09:54:50,083 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528664791] seconds in processing]: fom=0x55aa1c98c5e8, fop 0x55aa1c98c6a0[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528666209] seconds in processing]: fom=0x55aa1c9309b8, fop 0x55aa1c930a70[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528671757] seconds in processing]: fom=0x55aa1c937aa8, fop 0x55aa1c937b60[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528673657] seconds in processing]: fom=0x55aa1c95af58, fop 0x55aa1c95b010[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528675075] seconds in processing]: fom=0x55aa1c9298c8, fop 0x55aa1c929980[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528676347] seconds in processing]: fom=0x55aa1c962048, fop 0x55aa1c962100[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528670699] seconds in processing]: fom=0x55aa1c93eb98, fop 0x55aa1c93ec50[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528671883] seconds in processing]: fom=0x55aa1c945c88, fop 0x55aa1c945d40[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528673239] seconds in processing]: fom=0x55aa1c953e68, fop 0x55aa1c953f20[0] phase: Initialised
2022-06-29 09:54:50,084 - motr[00036]: d9f0 WARN [fop/fom.c:362:hung_fom_notify] FOP HUNG[[254:528674765] seconds in processing]: fom=0x55aa1c94cd78, fop 0x55aa1c94ce30[0] phase: Initialised
2022-06-29 09:54:50,092 - motr[00036]: cf40 ERROR [net/ip.c:452:m0_net_hostname_to_ip] gethostbyname err=1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
2022-06-29 09:54:50,092 - motr[00036]: cf40 ERROR [net/ip.c:454:m0_net_hostname_to_ip] <! rc=1
2022-06-29 09:54:50,092 - motr[00036]: d0e0 ERROR [net/libfab/libfab.c:2261:libfab_dns_resolve_retry] gethostbyname() failed with err 1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local@21002
2022-06-29 09:56:57,307 - motr[00036]: d3c0 ERROR [rpc/frmops.c:576:item_fail] packet 0x7ff048044210, item 0x7ff0480639b0[36] failed with ri_error=-110
2022-06-29 09:56:57,339 - motr[00036]: cf10 ERROR [net/ip.c:452:m0_net_hostname_to_ip] gethostbyname err=1 for 172-16-18-229.cortx-data-headless-svc-ssc-vm-rhev4-2450.cortx.svc.cluster.local
Found motr panic and errors in hare-hax logs:
motr[00093]: 5900 FATAL [lib/assert.c:50:m0_panic] panic: fatal signal delivered at unknown() (unknown:0) [git: 2.0.0-837-13-g77231467] /etc/cortx/hare/config/9a5a467bbc2a4aa2ef3b12142e1598cb/m0trace.93.2022-06-29-08:58:08
Motr panic: fatal signal delivered at unknown() unknown:0 (errno: 0) (last failed: none) [git: 2.0.0-837-13-g77231467] pid: 93 /etc/cortx/hare/config/9a5a467bbc2a4aa2ef3b12142e1598cb/m0trace.93.2022-06-29-08:58:08
Motr panic reason: signo: 11
/lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7fe5799766f3]
/lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7fe5799768c9]
/lib64/libmotr.so.2(m0_panic+0x13d)[0x7fe5799652cd]
/lib64/libmotr.so.2(+0x3a091c)[0x7fe57997691c]
/lib64/libpthread.so.0(+0x12b30)[0x7fe5821edb30]
/lib64/libmotr.so.2(m0_tlist_next+0xc)[0x7fe57996d12c]
/lib64/libmotr.so.2(+0x424dbe)[0x7fe5799fadbe]
/lib64/libmotr.so.2(m0_rpc_frm_enq_item+0x300)[0x7fe5799fb960]
/lib64/libmotr.so.2(m0_rpc_item_send+0x13c)[0x7fe579a009fc]
/lib64/libmotr.so.2(m0_rpc__post_locked+0x167)[0x7fe579a053f7]
/lib64/libmotr.so.2(m0_rpc_post+0x99)[0x7fe579a05629]
/lib64/libmotr.so.2(+0x36cd5a)[0x7fe579942d5a]
/lib64/libmotr.so.2(+0x35fb74)[0x7fe579935b74]
/lib64/libmotr.so.2(m0_thread_trampoline+0x5e)[0x7fe57996be5e]
/lib64/libmotr.so.2(+0x3a15b1)[0x7fe5799775b1]
/lib64/libpthread.so.0(+0x815a)[0x7fe5821e315a]
/lib64/libc.so.6(clone+0x43)[0x7fe581788dd3]
Checked thrice this results in different configs but faced the same issue again and again. This motr panic is earlier logged in https://jts.seagate.com/browse/CORTX-31834 cc. @mssawant, @vaibhavparatwar, @pavankrishnat
Tested 6N deployment with dtm enabled, Ref. Custom build at https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/6993 Successfully done deployment with SNS: 4+1+0 and DIX: 1+4+0 Config. Started new IOStabilityRuns job for degraded read type-3 at https://eos-jenkins.colo.seagate.com/job/QA/job/IOStabilityTestRuns/210 cc. @mssawant, @vaibhavparatwar, @pavankrishnat
retest this please
retest this please
Created https://jts.seagate.com/browse/CORTX-33263 for motr rpc crash seen in hax on data pod restart and failure.
retest this please
On process restart, before replying to the first entrypoint request, Hare notifies the process as M0_NC_FAILED to rest of the motr cluster. But this restricts dtm recovery completion on process restart as the process is marked as FAILED and not OFFLINE.
Solution: Notify OFFLINE instead of FAILED on process restart.
Signed-off-by: Mandar Sawant mandar.sawant@seagate.com