Closed mssawant closed 2 years ago
retest this please
please retest this
Tested with the ongoing IO with m0d failures and rgw failures and no issues observed. [root@ssc-vm-g4-rhev4-0717 ~]# s3bench -accessKey=sgiamadmin -accessSecret=ldapadmin -bucket=test-40039-bucket-1-healthy-$(dat e +%d%m-%H%M%S) -endpoint=https://192.168.47.71:30443 -numClients=10 -numSamples=100 -objectNamePrefix=object-degraded -object Size=16Mb -skipSSLCertVerification=True -s3MaxRetries=3 -region us-east-1 -validate -skipCleanup Write done in 16s with 0 errors Read done in 9s with 0 errors Validate done in 10s with 0 errors
With custom build, I have started a script which runs s3bench in parallel, wait for sometime io to complete than kill one of the ioservice - This happens in a loop. so far test has completed few rounds and do not see any issue, cortx-data-ssc-vm-g2-rhev4-3290-79ddd7768d-h6nkz 4/4 Running 4 (76m ago) 171m cortx-data-ssc-vm-g4-rhev4-1669-5b4ff96464-t5gb9 4/4 Running 3 (8m20s ago) 171m cortx-data-ssc-vm-g4-rhev4-1714-f854fb64c-44hhm 4/4 Running 1 (25m ago) 171m cortx-data-ssc-vm-g4-rhev4-1715-869779dbbd-2vrfq 4/4 Running 4 (3m22s ago) 171m cortx-data-ssc-vm-g4-rhev4-1719-69b59b9c6-l92jn 4/4 Running 1 (19m ago) 171m cortx-ha-785fd4968f-r2bcg 3/3 Running 0 167m cortx-kafka-0 1/1 Running 2 (174m ago) 175m cortx-kafka-1 1/1 Running 2 (174m ago) 175m cortx-kafka-2 1/1 Running 1 (174m ago) 175m cortx-server-ssc-vm-g2-rhev4-3290-5849954ddf-q8q9q 2/2 Running 1 (108m ago) 169m
commit details for reference:[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-3290 /]# rpm -qa | grep cortx cortx-provisioner-2.0.0-5045_325a3e0b.noarch cortx-hare-2.0.0-6887_gitab286e2.el8.x86_64 cortx-motr-2.0.0-6887_gite1f5e80e.el8.x86_64 cortx-py-utils-2.0.0-6887_76bd9e4b.noarch
Hare sends OFFLINE event on whenever process restarts, it does not check if the process is a hax or motr client process. If hax process restarts, it sends FirstEntrypointRequest and will try to send offline for itself, since no halink is established, the delivery will timeout and can further delay the entrypoint processing. Which may cause other side effects. There's no need to send OFFLINE for hax and motr client processes.
Solution: Skip Hax and motr client processes while sending OFFLINE for FirstEntrypoint requests.
Signed-off-by: Mandar Sawant mandar.sawant@seagate.com