Closed tshaffe1 closed 2 years ago
@SwapnilGaonkar7 @mssawant @Shreya-18 please review
@trshaffer, looks great, can you please post results of the test run?
The passing test is visible in the part of the 1 node deployment that ran successfully (https://eos-jenkins.colo.seagate.com/job/Cortx-PR-Build/job/Cortx-Deployment/job/Generic/job/hare/243/console). If I run the test myself, it gives the following output:
# pytest -rP test/test_failure.py
=================================================================== test session starts ====================================================================
platform linux -- Python 3.6.8, pytest-6.2.4, py-1.11.0, pluggy-0.13.1
rootdir: /root/cortx-hare/hax
plugins: timeout-1.4.2, aiohttp-0.3.0, mock-3.6.1, cov-3.0.0
collected 1 item
test/test_failure.py . [100%]
========================================================================== PASSES ==========================================================================
_____________________________________________________________ TestFailure.test_process_failure _____________________________________________________________
-------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------
DEBUG hax:__init__.py:297 Broadcasting HA states [HAState(fid=0x7200000000000001:0x15, status=FAILED)] over ha_link
DEBUG hax:__init__.py:513 Process fid=0x7200000000000001:0x15 encloses 1 services as follows: [FidWithType(fid=0x7300000000000001:0xe, service_type='ios')]
TRACE hax:cache.py:79 CACHE: created. fn_name=Motr._generate_sub_disks
DEBUG hax:__init__.py:550 proc fid=0x7200000000000001:0x15 encloses 1 disks as follows: [0x6b00000000000001:0x11]
DEBUG hax:__init__.py:634 Notifying node status for process_fid=0x7200000000000001:0x15 state=ObjHealth.FAILED
DEBUG hax:__init__.py:597 node_fid: 0x6e00000000000001:0x3 encl_fid: 0x6500000000000001:0x4 ctrl_fids: [0x6300000000000001:0x5] with state: ObjHealth.FAILED
==================================================================== 1 passed in 0.12s =====================================================================
@mssawant did you also want full output of the other tests?
@trshaffer, no, this looks good. Thanks.
retest this please
retest this please
This is a unit test to check behavior on single node failure. We invoke the handler for a state update indicating a process is gone, then check that the appropriate components (drive, controller, node, etc.) are marked failed in Consul. We also check that the failures are broadcast through hax.
This test only covers a particular code path (total node failure). There are other code paths for different scenarios (e.g. drive failure or service crash) that will require additional unit tests.