Open mdrakiburrahman opened 3 years ago
I made the changes in this PR however the high availability still does not work.
This is the log for the first node [01/06/2022 06:53:21] Registering SHIR with the node key: IR@8c0caf9f-7864-4443-9432-ab08946a227d@TmutDataFactory@ServiceEndpoint=tmutdatafactory.westus2.datafactory.azure.net@DX5HMJGj+Re+CY1jb44stt6oUTwBLkhik9kvVy7GCQc= [01/06/2022 06:53:21] Registering SHIR with the node name: Node_1 [01/06/2022 06:53:21] Registering SHIR with the enable high availability flag: true [01/06/2022 06:53:21] Registering SHIR with the tcp port: 8060 [01/06/2022 06:53:33] Start registering the new SHIR node [01/06/2022 06:53:33] Enable High Availability [01/06/2022 06:53:39] High Availability Enabled [01/06/2022 06:53:39] Registering New Node Node_1 [01/06/2022 06:53:50] Node Node_1 Registered [01/06/2022 06:53:50] Waiting 180 seconds before attempting first health check [01/06/2022 06:56:50] Check-Main-Process return true [01/06/2022 06:56:53] Check-Node-Connection return true ConnectionResult: Connected [01/06/2022 06:56:53] Node Health Check Pass
This is the log for the second node [01/06/2022 07:34:08] Registering SHIR with the node key: IR@8c0caf9f-7864-4443-9432-ab08946a227d@TmutDataFactory@ServiceEndpoint=tmutdatafactory.westus2.datafactory.azure.net@DX5HMJGj+Re+CY1jb44stt6oUTwBLkhik9kvVy7GCQc= [01/06/2022 07:34:08] Registering SHIR with the node name: Node_2 [01/06/2022 07:34:08] Registering SHIR with the enable high availability flag: true [01/06/2022 07:34:08] Registering SHIR with the tcp port: 8060 [01/06/2022 07:34:20] Start registering the new SHIR node [01/06/2022 07:34:20] Enable High Availability [01/06/2022 07:34:25] High Availability Enabled [01/06/2022 07:34:25] Registering New Node Node_2 [01/06/2022 07:34:36] Node Node_2 Registered [01/06/2022 07:34:36] Waiting 180 seconds before attempting first health check [01/06/2022 07:37:36] Check-Main-Process return true [01/06/2022 07:37:39] Check-Node-Connection error Node is offline ConnectionResult: Connecting [01/06/2022 07:37:39] Node Health Check Failed [01/06/2022 07:37:39] Stop the node connection [01/06/2022 07:37:56] Stop the node connection successfully
Notice that the connection result for Node_2 show as Connecting
When adding a second container with
ENABLE_HA
andHA_PORT
specified, the container doesn't correctly get registered and also fails the health check:I found the following fixes the issue: 1) Use
-EnableRemoteAccessInContainer
instead ofEnableRemoteAccess
2) Order of operation matters on the Second Node, need to run-EnableRemoteAccessInContainer
first, before-RegisterNewNode
is executedWith this setup - the second node is successful:
Enables HA: