Closed Mats-Elfving closed 9 months ago
Which environment variable do you set when starting container?
Hi, This is an extract from the deployment yaml ,
`
env:
- name: AUTH_KEY
valueFrom:
secretKeyRef:
key: SECRET_AUTH_KEY_1
name: app-secrets
- name: NODE_NAME # name as registered in synapse
value: "t-nx-mirage-synapse-shir"
- name: ENABLE_HA # The flag to enable high availability and scalability. It supports up to 4 nodes registered to the same IR when HA is enabled, otherwise only 1 is allowed.
value: "false" # default is false
- name: HA_PORT # The port to set up a high availability cluster.
value: "8060" # default is 8060
- name: ENABLE_AE # The flag to enable offline nodes auto-expiration. If enabled, the expired nodes will be removed automatically from the IR when a new node is attempting to register.
# Only works when ENABLE_HA=true.
value: "false" # default is false
- name: AE_TIME # The expiration timeout duration for offline nodes in seconds. Should be no less than 600 (10 minutes).
value: "600" # Least time = 10 minutes = 600 seconds
`
Hi, This is an extract from the deployment yaml ,
`
Container Environment variables:
env: - name: AUTH_KEY valueFrom: secretKeyRef: key: SECRET_AUTH_KEY_1 name: app-secrets - name: NODE_NAME # name as registered in synapse value: "t-nx-mirage-synapse-shir" - name: ENABLE_HA # The flag to enable high availability and scalability. It supports up to 4 nodes registered to the same IR when HA is enabled, otherwise only 1 is allowed. value: "false" # default is false - name: HA_PORT # The port to set up a high availability cluster. value: "8060" # default is 8060 - name: ENABLE_AE # The flag to enable offline nodes auto-expiration. If enabled, the expired nodes will be removed automatically from the IR when a new node is attempting to register. # Only works when ENABLE_HA=true. value: "false" # default is false - name: AE_TIME # The expiration timeout duration for offline nodes in seconds. Should be no less than 600 (10 minutes). value: "600" # Least time = 10 minutes = 600 seconds
`
Hi @Mats-Elfving
When the container crashed, the former node was not detached automatically and resulted in the problem.
Please just turn on the flags ENABLE_HA
and ENABLE_AE
to true
. This will let your integration runtime permit registering the restarted container and remove the former node automatically after the duration specified by AE_TIME
(default is 10 minutes).
Reading the comment on the same, I realize that you are of course right. Would this (HA_PORT) also allow me to deploy two (or more) pods (with different node-names)? How is the HA_PORT used? Is it an inter-pod communication port? Or a port that should be open for communications from out-of-cluster? or only within service?.
Reading the comment on the same, I realize that you are of course right. Would this (HA_PORT) also allow me to deploy two (or more) pods (with different node-names)? How is the HA_PORT used? Is it an inter-pod communication port? Or a port that should be open for communications from out-of-cluster? or only within service?.
sure, HA_PORT
is originally designed for multi-nodes deployment.
It is used only among the nodes in the same cluster and in your scenario, it is an inter-pod port.
Thank you for the help setting this up. I ended up using the below settings. This would ensure a name that is relevant and at the time unique.
`
env:
- name: AUTH_KEY
valueFrom:
secretKeyRef:
key: SECRET_AUTH_KEY_1
name: app-secrets
- name: NODE_NAME # name as registered in synapse
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ENABLE_HA # The flag to enable high availability and scalability. It supports up to 4 nodes registered to the same IR when HA is enabled, otherwise only 1 is allowed.
value: "true" # default is false
- name: HA_PORT # The port to set up a high availability cluster.
value: "8060" # default is 8060
- name: ENABLE_AE # The flag to enable offline nodes auto-expiration. If enabled, the expired nodes will be removed automatically from the IR when a new node is attempting to register.
# Only works when ENABLE_HA=true.
value: "true" # default is false
- name: AE_TIME # The expiration timeout duration for offline nodes in seconds. Should be no less than 600 (10 minutes).
value: "600" # Least time = 10 minutes = 600 seconds
- name: ORACLE_HOME
value: "C:\\Oracle\\instantclient_21_11"
- name: TNS_ADMIN
value: "C:\\Oracle\\instantclient_21_11\\network\\admin"
`
closing this ! :)
Problem: when container terminates non-gracefully it is unable to connect to synapse during restart
Background: I have setup the container as suggested, with below alterations
It is running fine on an AKS cluster and the deployment is controlled using ArgoCd. When I (or AKS backend) terminates/deletes the pod is restarted by ArgoCd. This would end up in an error "Registration of new node is forbidden when Remote Access is disabled on another node. To enable it, you can login the machine where the other node is installed and run 'dmgcmd.exe -EnableRemoteAccess "" [""]'."
As a work-around I am able to delete the node in synapse and then restart the pod. With these steps it is able to re-connect . How can I get around this problem