Seagate / cortx-hare

CORTX Hare configures Motr object store, starts/stops Motr services, and notifies Motr of service and device faults.
https://github.com/Seagate/cortx
Apache License 2.0
13 stars 80 forks source link

CORTX-29142: Bytecount thread exiting due to HAConsistencyException #2034

Closed Shreya-18 closed 2 years ago

Shreya-18 commented 2 years ago

Due to the unavailability of motr's byte count data, consul kv maybe not be updated with appropriate keys. And fetching those keys lead to HAConsistencyException and aborting the thread.

Such delays from motr land are expected, so this kind of error can be swallowed and next timely attempts will be made.

Fixes following issue

2022-03-22 10:34:37,697 [ERROR] {byte-count-updater} Aborting due to an error
Traceback (most recent call last):
File "/opt/seagate/cortx/hare/lib64/python3.6/site-packages/hax/bytecount.py", line 149, in _execute
pver_items = self._get_pver_with_pver_status(motr)
File "/opt/seagate/cortx/hare/lib64/python3.6/site-packages/hax/bytecount.py", line 63, in _get_pver_with_pver_status
iosservice_items = self.consul.kv.kv_get('ioservices/', recurse=True)
File "/opt/seagate/cortx/hare/lib64/python3.6/site-packages/hax/consul/cache.py", line 126, in wrapper
ret_value = f(*args, **kwds)
File "/opt/seagate/cortx/hare/lib64/python3.6/site-packages/hax/util.py", line 214, in kv_get
raise HAConsistencyException('Could not get data from Consul KV')
hax.exception.HAConsistencyException
2022-03-22 10:34:37,700 [DEBUG] {byte-count-updater} byte-count updater thread exited

Logs showing exception is dissolved

Mar 22 06:06:09 ssc-vm-g3-rhev4-2743.colo.seagate.com hare-hax[106381]: 2022-03-22 06:06:09,649 [DEBUG] {byte-count-updater} Received bytecount: ByteCountStats(proc_fid=0x7200000000000001:0x4a, pvers=[])
Mar 22 06:06:09 ssc-vm-g3-rhev4-2743.colo.seagate.com hare-hax[106381]: 2022-03-22 06:06:09,649 [DEBUG] {byte-count-updater} KVGET key=ioservices/, kwargs={'recurse': True}
Mar 22 06:06:09 ssc-vm-g3-rhev4-2743.colo.seagate.com hare-hax[106381]: 2022-03-22 06:06:09,652 [DEBUG] {byte-count-updater} Failed to update Consul KV due to an intermittent error. The error is swallowed since new attempts will be made timely
Mar 22 06:06:39 ssc-vm-g3-rhev4-2743.colo.seagate.com hare-hax[106381]: 2022-03-22 06:06:39,654 [DEBUG] {byte-count-updater} KVGET key=leader, kwargs={}

Signed-off-by: Shreya Karmakar shreya.karmakar@seagate.com

Shreya-18 commented 2 years ago

Custom build - https://eos-jenkins.colo.seagate.com/job/GitHub-custom-ci-builds/job/generic/job/custom-ci/5427/

Deployment - https://eos-jenkins.colo.seagate.com/job/Cortx-Automation/job/RGW/job/setup-cortx-rgw-cluster/1298/

-----------------------------------[ Image Details ]--------------------------------------
ghcr.io/seagate/consul:1.10.0
cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-5427-custom-ci
cortx-docker.colo.seagate.com/seagate/cortx-rgw:2.0.0-5427-custom-ci
ghcr.io/seagate/kafka:3.0.0-debian-10-r7
ghcr.io/seagate/symas-openldap:2.4.58
ghcr.io/seagate/zookeeper:3.7.0-debian-10-r182
---------------------------------------[ POD Status ]--------------------------------------
NAME                                                 READY   STATUS    RESTARTS   AGE     IP               NODE                                    NOMINATED NODE   READINESS GATES
consul-client-8pqdt                                  1/1     Running   0          7m7s    172.18.189.1     ssc-vm-g2-rhev4-2621.colo.seagate.com              
consul-client-sprc2                                  1/1     Running   0          7m34s   172.18.149.238   ssc-vm-g2-rhev4-2610.colo.seagate.com              
consul-client-wlj9n                                  1/1     Running   0          7m33s   172.18.251.168   ssc-vm-g2-rhev4-2611.colo.seagate.com              
consul-server-0                                      1/1     Running   0          6m1s    172.18.189.20    ssc-vm-g2-rhev4-2621.colo.seagate.com              
consul-server-1                                      1/1     Running   0          6m47s   172.18.251.162   ssc-vm-g2-rhev4-2611.colo.seagate.com              
consul-server-2                                      1/1     Running   0          7m38s   172.18.149.236   ssc-vm-g2-rhev4-2610.colo.seagate.com              
cortx-control-c87d69466-k7t2j                        1/1     Running   0          4m25s   172.18.189.14    ssc-vm-g2-rhev4-2621.colo.seagate.com              
cortx-data-ssc-vm-g2-rhev4-2610-5648b4d996-wlxc4     4/4     Running   0          3m35s   172.18.149.251   ssc-vm-g2-rhev4-2610.colo.seagate.com              
cortx-data-ssc-vm-g2-rhev4-2611-6fb58c699-q26t8      4/4     Running   0          3m34s   172.18.251.145   ssc-vm-g2-rhev4-2611.colo.seagate.com              
cortx-data-ssc-vm-g2-rhev4-2621-554b5cdb4-v8ss8      4/4     Running   0          3m33s   172.18.189.52    ssc-vm-g2-rhev4-2621.colo.seagate.com              
cortx-ha-696c8788cb-shhdc                            3/3     Running   0          40s     172.18.189.9     ssc-vm-g2-rhev4-2621.colo.seagate.com              
cortx-server-ssc-vm-g2-rhev4-2610-7bb966b64d-h5jvp   2/2     Running   0          2m5s    172.18.149.198   ssc-vm-g2-rhev4-2610.colo.seagate.com              
cortx-server-ssc-vm-g2-rhev4-2611-695dbf9cd-g8t8q    2/2     Running   0          2m4s    172.18.251.172   ssc-vm-g2-rhev4-2611.colo.seagate.com              
cortx-server-ssc-vm-g2-rhev4-2621-6f66f68694-rt27l   2/2     Running   0          2m4s    172.18.189.13    ssc-vm-g2-rhev4-2621.colo.seagate.com              
kafka-0                                              1/1     Running   0          5m40s   172.18.189.63    ssc-vm-g2-rhev4-2621.colo.seagate.com              
kafka-1                                              1/1     Running   0          5m40s   172.18.251.139   ssc-vm-g2-rhev4-2611.colo.seagate.com              
kafka-2                                              1/1     Running   0          5m40s   172.18.149.196   ssc-vm-g2-rhev4-2610.colo.seagate.com              
openldap-0                                           1/1     Running   0          7m38s   172.18.149.239   ssc-vm-g2-rhev4-2610.colo.seagate.com              
openldap-1                                           1/1     Running   0          7m22s   172.18.251.166   ssc-vm-g2-rhev4-2611.colo.seagate.com              
openldap-2                                           1/1     Running   0          6m59s   172.18.189.18    ssc-vm-g2-rhev4-2621.colo.seagate.com              
zookeeper-0                                          1/1     Running   0          6m22s   172.18.189.4     ssc-vm-g2-rhev4-2621.colo.seagate.com              
zookeeper-1                                          1/1     Running   0          6m22s   172.18.251.157   ssc-vm-g2-rhev4-2611.colo.seagate.com              
zookeeper-2                                          1/1     Running   0          6m22s   172.18.149.254   ssc-vm-g2-rhev4-2610.colo.seagate.com              
-----------[ Sleeping for 1min before checking hctl status.... ]--------------------
---------------------------------------[ hctl status ]-----------------------------------------
Tue Mar 22 06:49:02 MDT 2022
Bytecount:
    critical : 0
    damaged : 0
    degraded : 0
    healthy : 0
Data pool:
    # fid name
    0x6f00000000000001:0x93 'storage-set-1__sns'
Profile:
    # fid name: pool(s)
    0x7000000000000001:0xe3 'Profile_the_pool': 'storage-set-1__sns' 'storage-set-1__dix' None
Services:
    cortx-data-headless-svc-ssc-vm-g2-rhev4-2610  (RC)
    [started]  hax                 0x7200000000000001:0x2b         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2610@22001
    [started]  ioservice           0x7200000000000001:0x2e         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2610@21001
    [started]  ioservice           0x7200000000000001:0x3b         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2610@21002
    [started]  confd               0x7200000000000001:0x48         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2610@21003
    cortx-data-headless-svc-ssc-vm-g2-rhev4-2621 
    [started]  hax                 0x7200000000000001:0x7          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2621@22001
    [started]  ioservice           0x7200000000000001:0xa          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2621@21001
    [started]  ioservice           0x7200000000000001:0x17         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2621@21002
    [started]  confd               0x7200000000000001:0x24         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2621@21003
    cortx-data-headless-svc-ssc-vm-g2-rhev4-2611 
    [started]  hax                 0x7200000000000001:0x4f         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2611@22001
    [started]  ioservice           0x7200000000000001:0x52         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2611@21001
    [started]  ioservice           0x7200000000000001:0x5f         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2611@21002
    [started]  confd               0x7200000000000001:0x6c         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2611@21003
    cortx-server-headless-svc-ssc-vm-g2-rhev4-2621 
    [started]  hax                 0x7200000000000001:0x71         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2621@22001
    [started]  rgw                 0x7200000000000001:0x74         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2621@21501
    cortx-server-headless-svc-ssc-vm-g2-rhev4-2610 
    [started]  hax                 0x7200000000000001:0x79         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2610@22001
    [started]  rgw                 0x7200000000000001:0x7c         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2610@21501
    cortx-server-headless-svc-ssc-vm-g2-rhev4-2611 
    [started]  hax                 0x7200000000000001:0x81         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2611@22001
    [started]  rgw                 0x7200000000000001:0x84         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2611@21501
-----------[ Time taken for service to start 0 mins ]--------------------

Performed IOs (4+2 config)

[root@ssc-vm-g2-rhev4-2610 ~]# aws s3 mb s3://test-bucket --endpoint-url http://192.168.54.124:30080
make_bucket: test-bucket
[root@ssc-vm-g2-rhev4-2610 ~]# aws s3 cp /tmp/9M s3://test-bucket/object1 --endpoint-url http://192.168.54.124:30080
upload: ../tmp/9M to s3://test-bucket/object1
[root@ssc-vm-g2-rhev4-2610 ~]# aws s3 ls s3://test-bucket --endpoint-url http://192.168.54.124:30080
2022-03-22 07:22:03    9437184 object1

[root@cortx-data-headless-svc-ssc-vm-g2-rhev4-2610 /]# hctl status
Bytecount:
    critical : 0
    damaged : 0
    degraded : 0
    healthy : 12582912
Data pool:
    # fid name
    0x6f00000000000001:0x93 'storage-set-1__sns'
Profile:
    # fid name: pool(s)
    0x7000000000000001:0xe3 'Profile_the_pool': 'storage-set-1__sns' 'storage-set-1__dix' None
Services:
    cortx-data-headless-svc-ssc-vm-g2-rhev4-2621
    [started]  hax                 0x7200000000000001:0x2b         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2621@22001
    [started]  ioservice           0x7200000000000001:0x2e         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2621@21001
    [started]  ioservice           0x7200000000000001:0x3b         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2621@21002
    [started]  confd               0x7200000000000001:0x48         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2621@21003
    cortx-data-headless-svc-ssc-vm-g2-rhev4-2610
    [started]  hax                 0x7200000000000001:0x7          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2610@22001
    [started]  ioservice           0x7200000000000001:0xa          inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2610@21001
    [started]  ioservice           0x7200000000000001:0x17         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2610@21002
    [started]  confd               0x7200000000000001:0x24         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2610@21003
    cortx-data-headless-svc-ssc-vm-g2-rhev4-2611  (RC)
    [started]  hax                 0x7200000000000001:0x4f         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2611@22001
    [started]  ioservice           0x7200000000000001:0x52         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2611@21001
    [started]  ioservice           0x7200000000000001:0x5f         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2611@21002
    [started]  confd               0x7200000000000001:0x6c         inet:tcp:cortx-data-headless-svc-ssc-vm-g2-rhev4-2611@21003
    cortx-server-headless-svc-ssc-vm-g2-rhev4-2610
    [started]  hax                 0x7200000000000001:0x71         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2610@22001
    [started]  rgw                 0x7200000000000001:0x74         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2610@21501
    cortx-server-headless-svc-ssc-vm-g2-rhev4-2621
    [started]  hax                 0x7200000000000001:0x79         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2621@22001
    [started]  rgw                 0x7200000000000001:0x7c         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2621@21501
    cortx-server-headless-svc-ssc-vm-g2-rhev4-2611
    [started]  hax                 0x7200000000000001:0x81         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2611@22001
    [started]  rgw                 0x7200000000000001:0x84         inet:tcp:cortx-server-headless-svc-ssc-vm-g2-rhev4-2611@21501
vaibhavparatwar commented 2 years ago

@mssawant @SwapnilGaonkar7 - can we review and merge this today?