Open ffilippopoulos opened 3 years ago
This is unrelated to named tunnels feature. The behavior changed slightly with introduction of Quick Reconnect feature in 2020.5.1. After disconnect cloudflare backend doesn't immediately unregister the tunnel in hopes that the same cloudflared instance will come back quickly. When you launch a new instance, there's a small period where both origins can be registered. This is causing you to go over allowed number of origins (5 in your case). This is what the origin list length must be in range [1, 5]: validation failed
error means.
Named tunnels model is very different. We no longer automatically manage load balancer origins for you. The tunnels exist purely as a way to proxy traffic back to the origin. It's up to you to configure how traffic from the internet gets there - either via LB or DNS.
@ipostelnik thank you very much for your response, and for explaining a bit more about named tunnels feature.
After disconnect cloudflare backend doesn't immediately unregister the tunnel in hopes that the same cloudflared instance will come back quickly
in regards to that ^ is there a way to enable/disable this behaviour to support dynamic environments like a kubernetes deployment? Or the only solution here would be to increase the allowed origins?
We have new recommendations for using Argo Tunnel in kubernetes clusters - https://developers.cloudflare.com/argo-tunnel/routing-to-tunnel/kubernetes. Instead of using a sidecar, you should deploy one or more instances of cloudflared configured to proxy to your service or ingress controller. We use this model internally as well.
@ipostelnik thanks for the suggestion, I've tried setting up tunnels and using their credentials to connect via pods, but I found the 2 deployments solution rather odd and not a good fit for kubernetes deployments. So in the link you posted it says:
Create two deployments wth one replica each using cloudflared. Configure cloudflared to point to the service IP of the upstream service. Mount the secrets created in Step 1 and point cloudflared to the right path.
but that way you have multiple deployments for effectively the same app, so you break kube's automation. If for example you bump the cloudflared image via your ci, this will not cause a graceful rollout, but both deployments will restart at the same time giving your service a blip.
I agree that having two separate deployments (rather than replicas) with different tunnel IDs is not perfect, but it works pretty well in practice. For your automation, you should define separate template variables for each deployment's version of cloudflared. This way you can bump them separately and apply the changes.
@ipostelnik thank you for your replies, I appreciate the help. Changing practices for that seemed a bit too much, so my solution for now is to deploy a statefulset and run an alpine sidecar that will do the following:
command:
- /bin/sh
- -c
- |
apk add --no-cache libc6-compat;
wget -O /usr/local/bin/cloudflared https://github.com/cloudflare/cloudflared/releases/download/${CLOUDFLARED_VERSION}/cloudflared-linux-amd64;
chmod +x /usr/local/bin/cloudflared;
if [[ ${HOSTNAME##*-} == 0 ]]; then
uuid="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
fi;
if [[ ${HOSTNAME##*-} == 1 ]]; then
uuid="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
fi;
if [[ ${HOSTNAME##*-} == 2 ]]; then
uuid="ccccccccccccccccccccccccccccccccccccc"
fi;
/usr/local/bin/cloudflared tunnel run ${uuid}
Obviously this is not ideal but could work for now, so I guess you can feel free to close that issue. I'd be keen to watch any further development towards kubernetes argo tunnel pods to deal with the >=2 deployment approach and allow more flexible setups.
@ffilippopoulos this is a great solution. We've discussed using StatefulSet, I'm glad it's working well for you. We'll take a look at adding this suggestion to the docs.
cc: @abracchi-tw
Environment: Kubernetes: v1.19.2 os: flatcar Container Linux by Kinvolk 2605.6.0 kernel: 5.4.67-flatcar docker://19.3.12 cloudflared version >= 2020.7.0
We have a kubernetes deployment with 3 replicas that all run cloudflared tunnel as sidecar with the following arguments:
, where hostname is the name of a load balancer created under our cloudflaire account and lb pool name a defined pool. That way containers are able to join the pool's origins and the lb sends traffic down to our pods successfully.
The issue: When rolling pods we observe that the respective origins enter
Critical
state for some time until cloudflare cleans them. From the cloudflared sidecar point of view we see the following logs:meaning that because we are using the full capacity of allowed origins the new pod cannot be registered until the cleanup happens.
It looks like this was introduced at first with commit: https://github.com/cloudflare/cloudflared/commit/8cc69f2a95750468d3da8f0c485944ee82931063 and one could work arounnd it via setting the flag: --use-quick-reconnects=false This could work until version 2020.6.6.
The behaviour changed again here: https://github.com/cloudflare/cloudflared/commit/2a3d486126449a7b9228c5a93ac1e0691bb0e00f when implementing named tunnels. As far as I can understand we cannot use named tunnels with a load balancer atm(?)
Changing our deployment to a statefulSet, in order to be able to set a constant
--name
flag per pod to test the above, results in pods crashing after logging:Is there a way to use the newer cloudflared images with a load balancer and preserve the behaviour (quickly remove
critical
origins so we can successfully roll our deployments pods)?