Azure / Azure-Data-Factory-Integration-Runtime-in-Windows-Container

Azure Data Factory Integration Runtime in Windows Container Sample
MIT License
25 stars 36 forks source link

Adding second container doesn't work for HA due to order of operations and incorrect remote access command #4

Open mdrakiburrahman opened 3 years ago

mdrakiburrahman commented 3 years ago

When adding a second container with ENABLE_HA and HA_PORT specified, the container doesn't correctly get registered and also fails the health check: image

I found the following fixes the issue: 1) Use -EnableRemoteAccessInContainer instead of EnableRemoteAccess 2) Order of operation matters on the Second Node, need to run -EnableRemoteAccessInContainer first, before -RegisterNewNode is executed

With this setup - the second node is successful: image

Enables HA: image

curtis-durrett commented 2 years ago

I made the changes in this PR however the high availability still does not work.

This is the log for the first node [01/06/2022 06:53:21] Registering SHIR with the node key: IR@8c0caf9f-7864-4443-9432-ab08946a227d@TmutDataFactory@ServiceEndpoint=tmutdatafactory.westus2.datafactory.azure.net@DX5HMJGj+Re+CY1jb44stt6oUTwBLkhik9kvVy7GCQc= [01/06/2022 06:53:21] Registering SHIR with the node name: Node_1 [01/06/2022 06:53:21] Registering SHIR with the enable high availability flag: true [01/06/2022 06:53:21] Registering SHIR with the tcp port: 8060 [01/06/2022 06:53:33] Start registering the new SHIR node [01/06/2022 06:53:33] Enable High Availability [01/06/2022 06:53:39] High Availability Enabled [01/06/2022 06:53:39] Registering New Node Node_1 [01/06/2022 06:53:50] Node Node_1 Registered [01/06/2022 06:53:50] Waiting 180 seconds before attempting first health check [01/06/2022 06:56:50] Check-Main-Process return true [01/06/2022 06:56:53] Check-Node-Connection return true ConnectionResult: Connected [01/06/2022 06:56:53] Node Health Check Pass

This is the log for the second node [01/06/2022 07:34:08] Registering SHIR with the node key: IR@8c0caf9f-7864-4443-9432-ab08946a227d@TmutDataFactory@ServiceEndpoint=tmutdatafactory.westus2.datafactory.azure.net@DX5HMJGj+Re+CY1jb44stt6oUTwBLkhik9kvVy7GCQc= [01/06/2022 07:34:08] Registering SHIR with the node name: Node_2 [01/06/2022 07:34:08] Registering SHIR with the enable high availability flag: true [01/06/2022 07:34:08] Registering SHIR with the tcp port: 8060 [01/06/2022 07:34:20] Start registering the new SHIR node [01/06/2022 07:34:20] Enable High Availability [01/06/2022 07:34:25] High Availability Enabled [01/06/2022 07:34:25] Registering New Node Node_2 [01/06/2022 07:34:36] Node Node_2 Registered [01/06/2022 07:34:36] Waiting 180 seconds before attempting first health check [01/06/2022 07:37:36] Check-Main-Process return true [01/06/2022 07:37:39] Check-Node-Connection error Node is offline ConnectionResult: Connecting [01/06/2022 07:37:39] Node Health Check Failed [01/06/2022 07:37:39] Stop the node connection [01/06/2022 07:37:56] Stop the node connection successfully

Notice that the connection result for Node_2 show as Connecting

curtis-durrett commented 2 years ago

IR_Status