Closed ArkShocer closed 8 months ago
By default the helm chart will deploy a standalone redis instance called "terrakube-redis-master" in your namespace, could you validate if redis was deployed?
Yes, I can confirm redis was deployed as expected Logs of redis:
1:C 01 Mar 2024 14:46:33.396 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 01 Mar 2024 14:46:33.396 # Redis version=7.0.11, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 01 Mar 2024 14:46:33.396 # Configuration loaded
1:M 01 Mar 2024 14:46:33.396 * monotonic clock: POSIX clock_gettime
1:M 01 Mar 2024 14:46:33.397 * Running mode=standalone, port=6379.
1:M 01 Mar 2024 14:46:33.401 # Server initialized
1:M 01 Mar 2024 14:46:33.417 * Reading RDB base file on AOF loading...
1:M 01 Mar 2024 14:46:33.417 * Loading RDB produced by version 7.0.11
1:M 01 Mar 2024 14:46:33.417 * RDB age 91225 seconds
1:M 01 Mar 2024 14:46:33.417 * RDB memory usage when created 0.82 Mb
1:M 01 Mar 2024 14:46:33.417 * RDB is base AOF
1:M 01 Mar 2024 14:46:33.417 * Done loading RDB, keys loaded: 0, keys expired: 0.
1:M 01 Mar 2024 14:46:33.417 * DB loaded from base file appendonly.aof.1.base.rdb: 0.009 seconds
1:M 01 Mar 2024 14:46:33.417 * DB loaded from append only file: 0.009 seconds
1:M 01 Mar 2024 14:46:33.417 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
1:M 01 Mar 2024 14:46:33.417 * Ready to accept connections
When the API is deployed it creates one secret (terrakube-api-secrets) where it handle the Redis connection.
It looks like the API cannot resolve the redis service, not sure if maybe is an issue about the connectivity between your pods.
And the same will happen with the executor componet.
I checked the connectivity between pods from a different app but in the same argo project (means they have the same network policys) and I can't find any connectivity issue between the pods from that app. I also now deployed an azure redis cache to test with an external redis cache but both pods still degrade after some time even though I can see server & memory load on the external redis.
After some more digging in the Charts I found that the redis config is set to unsecure port 6379
by default, which is not enabled in azure redis cache but rather 6380
(ssl). So I allowed unsecure connections and all errors are gone in seconds (yay). The only thing im still clueless about is why the internal cluster redis connections where failing and why terrakube can't create an ingress connection right now.
I checked the connectivity between pods from a different app but in the same argo project (means they have the same network policys) and I can't find any connectivity issue between the pods from that app. I also now deployed an azure redis cache to test with an external redis cache but both pods still degrade after some time even though I can see server & memory load on the external redis.
Maybe you can check the redis service name, for example if you deploy the default redis it will create a redis service with hostname "terrakube-redis-master" (the one that is used by default),
But if you are using an external redis in other namespace it should be "yourredisservice.namespace" if I remembered correctly.
With everything local deployed it looks like the picture below for me. Yesterday, before the merge it looked exactly the same but didn't work. I think probably the ingress config merge fixed the connection issue between the pods from terrakube but I will check back on this later this week and close the issue if its gone now.
With everything local deployed it looks like the picture below for me. Yesterday, before the merge it looked exactly the same but didn't work. I think probably the ingress config merge fixed the connection issue between the pods from terrakube but I will check back on this later this week and close the issue if its gone now.
By the way in AKS I think you need to use NodePort instead of ClusterIP.
I think ClusterIp only work with ingress like nginx, but for cloud providers like gke, eks or aks you need to use nodeport with the native cloud ingress
ClusterIP is fine for our configuration since we use nginx ingress for our aks without public ip. If we would use the native solutions then yes nodeport or loadbalancer would be better.
Hi, first of all thank you to everyone for the amazing tool! I sadly haven't gotten around to make it work but it looks really promising to me! Maybe someone can direct me on finding the error on my deployment. When deploying Terrakube with the chart version
3.14.2
on my 1.28.3 AKS via Helm (using argocd but should be the same) I get errors on the api and executor pod.Error on api pod:
My values.yaml looks like the following:
If it helps I can also post the full logs of the api or executor pod.