Closed gothub closed 2 years ago
The NGINX Ingress Controller was installed from helm and the hello-kubernetes
app was started on the dev k8s, after the existing NGINX was uninstalled.
This test was a replication of the test @mbjones did with Docker desktop, as described in the GNIS repo issue that was mentioned above.
Starting NGINX and the app in this way made available the NGINX Ingress Controller as type 'loadBalancer' at ports 80:31199/TCP,443:31964/TCP
as shown by the kubectl get pods,services
command:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default pod/ingress-nginx-controller-748d8ff6c7-m4sz7 1/1 Running 0 4m28s 192.168.3.138 docker-dev-ucsb-1 <none> <none>
hello-kubernetes pod/hello-deployment-77975745cc-9sq7l 1/1 Running 0 2m13s 192.168.50.157 docker-dev-ucsb-2 <none> <none>
hello-kubernetes pod/hello-deployment-77975745cc-vlbqv 1/1 Running 0 2m13s 192.168.3.178 docker-dev-ucsb-1 <none> <none>
hello-kubernetes pod/hello-deployment-77975745cc-whrt5 1/1 Running 0 2m13s 192.168.50.186 docker-dev-ucsb-2 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/ingress-nginx-controller LoadBalancer 10.106.90.173 <pending> 80:31199/TCP,443:31964/TCP 4m29s app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
default service/ingress-nginx-controller-admission ClusterIP 10.96.219.179 <none> 443/TCP 4m29s app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 611d <none>
hello-kubernetes service/hello-service ClusterIP 10.96.203.197 <none> 80/TCP 2m13s app=hellodemo
The NGINX loadBalancer
service remains in the pending
state (EXTERNAL_IP).
The files, including a log file, are available at https://github.com/DataONEorg/k8s-cluster/tree/feature-%2316-nginx-load-balancer/control-plane/nginx-load-balancer
An external connection can be made to k8s via port 31964 but not 443:
On avatar (remote MBP):
avatar:~ slaughter$ curl -k https://api.test.dataone.org:31964
Hello Kubernetes!avatar:~ slaughter$
avatar:Desktop slaughter$ curl -k https://api.test.dataone.org
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.test.dataone.org:443
Using the hostPort
approach (mentioned in GNIS issue 5) may not be the solution that is chosen for DataONE k8s, but it will be summarized here for future reference. Here is the general outline:
ingress-control-node
Deployment
to DaemonSet
(so only one NGINX pod per node)hostPort
so ports 80/443 are available for each node running NGINX pods
ingress-control-node
labeled nodesIf anyone has any additional ideas on how to solve this problem, please detail them here.
Hey @gothub, thanks for continuing to try getting this working.
I gleaned form the above that we're close but that the
I think it ended up working and I was able get things to where I could deploy a basic webapp and route traffic to it over a subdomain I own with a working LetsEncrypt cert. I was also able to test that requests were being load balanced across pods and across nodes.
I wrote up my steps and ran through them twice to check for errors. I put them in a gist: https://gist.github.com/amoeba/07f713bb4fca8b74fbc0d314bf9e70d6.
Maybe we could go over this next week to see if it could help us get our cluster working. Let me know.
@bryce thanks for looking into this. How does Digital Ocean handle the routing from port 443 to the ingress controller? I had a look at the DO tutorial and it's not clear to me how this is done.
As far as I can tell, the ingress controller is just binding port 80/443 on the Node (droplet), making any request like http(s)://$NODE_EXTERNAL_IP:{80,443}
route to the ingress controller. DO Droplets are pretty much analogous to our static VMs at NCEAS so I don't think there's any magic going on.
Just to test that my steps didn't rely on anything DigitalOcean offered that we didn't have on our end, I ran through my steps on my my NCEAS development vm (neutral-cat). Aside from some hiccups that weren't really related, I think things are set up and working. See https://neutral-cat.nceas.ucsb.edu (I'll be firewalling this host off soon so you probably should be vpn'd to visit).
@bryce difference between the neutral-cat k8s and DataONE k8s that may be significant:
--set controller.publishService.enabled=true
:
kubectl edit service nginx-ingress-ingress-nginx-controller
# In your editor, change 'type':
type: NodePort
# and add an externalIPs field to 'spec':
spec:
externalIPs:
I'm not sure how significant these differences are, but my point is:
neutral-cat is running a DigitalOcean version of k8s, not the kubeadm installed k8s on the D1 k8s
Could you explain this a bit more? I didn't install DO managed k8s, I just installed kubeadm on an empty Ubuntu VM. And I re-did the test on my NCEAS VM which was also a kubeadm install.
Re: nginx ingress controller was installed with --set controller.publishService.enabled=true
, I wasn't sure but I see this in the docs for that Chart:
if true, the controller will set the endpoint records on the ingress objects to reflect those on the service
There's a similar flag for the nginx-ingress-controller itself,
publish-service: Service fronting the Ingress controller. Takes the form "namespace/name". When used together with update-status, the controller mirrors the address of this service's endpoints to the load-balancer status of all Ingress objects it satisfies.
So is that doing this to my Ingress?
status:
loadBalancer:
ingress:
- ip: 128.111.85.32
After talking with @mbjones and @gothub on Slack, we decided I'd try to replicate what I had done on my own VM on the Dev cluster's control plane VM.
I...
kubectl edit
'ing the Service def
externalIPs:
- 128.111.85.190
host
port declaration from the NGINX Ingress Controller Deployment and verified the host ports are no longer showing as bound
; k describe --namespace=ingress-nginx deployments ingress-nginx-controller | grep -i port
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
I think thing are working but would love @gothub to check it over. I also left up the hello-kubernetes
deployment I set up before and deployed it at https://api.test.dataone.org just to test that load balancing was happening.
The helm command that replicates the externalIP
config that @amoeba used to get port 80/443 working is:
helm upgrade --install ingress-nginx ingress-nginx \
--version=4.0.6 \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.service.type=ClusterIP \
--set controller.service.externalIPs={128.111.85.190} \
--set controller.ingressClassResource.default=true \
--set controller.ingressClassResource.name=nginx \
--set controller.admissionWebhooks.enabled=false
With this installed on dev k8s, https://stage.gnis-ld.org/data
is working and provides a valid LE cert, so the ingress/secret is setup correctly.
Also https://api.test.dataone.org/quality
is available, but is not providing a valid LE cert. This may be due to a config problem or cert-manager - more debugging needed for this.
w00t 🎉 🥇 🎉
Great to see, @gothub, thanks. I'm really glad the Helm chart had enough CLI flags for our use case.
@mbjones dropped a note in Slack today to have us check about source IP preservation. Apparently ESS-DIVE has been bit by this before. It looks like, by default, source IPs don't get preserved and this means that applications only see a cluster IP address as the source IP of incoming HTTP requests. I'm not sure if this is an immediate problem for the services we're running today but it's almost guaranteed to be a problem in the future (ie if we want to deploy Metacat or if a service wants to do usage tracking or rate limiting).
I did a test on my development cluster that's running nginx-ingress-controller and I'm seeing a cluster IP instead of the real source IP when I hit a pod running its own version of nginx:
::ffff:10.32.0.8 - - [20/Nov/2021:01:16:47 +0000] "GET /favicon.ico HTTP/1.1" 404 150 "https://neutral-cat.nceas.ucsb.edu/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"
@mbjones mentioned two links we might start on,
@gothub maybe you were already planning on figuring this out along with what you're already doing but I thought I'd drop this note for posterity.
I tested the Local
externalTrafficPolicy
config mentioned in https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip and confirmed it works: (1) I see the real request source ID and (2) traffic is distributed over the available pods. You can see this in action at (VPN required) https://neutral-cat.nceas.ucsb.edu/. Let me know if you don't see your IP listed.
They spookily mention that setting the policy to Local
:
risks potentially imbalanced traffic spreading.
I don't understand this risk but I do know how to use R, so I ran just tested it. Over 1000 requests, I get this spread across three pods:
> table(result)
hello-kubernetes-6c4fb97c77-4kbxr hello-kubernetes-6c4fb97c77-kh28b hello-kubernetes-6c4fb97c77-pwl5w
334 334 332
Looks pretty even to me so maybe Local
mode isn't a big deal?
Great sleuthing and nice empirical demo, @amoeba . Seems like a good way to start for our config.
@bryce - could you post your k8s service yaml that allowed you to retain the source IP?
Here's my full NIC service spec:
The fix to preserve the source IP didn't work in the test I ran recently. The NGINX ingress controller service definition was reconfigured to use
...
spec:
type: NodePort
externalIPs:
- 128.111.85.190
externalTrafficPolicy: Local # <-- added this line for the test
ipFamilyPolicy: SingleStack # <-- added this line for the test
ipFamilies: # <-- added this line for the test
- IPv4 # <-- added this line for the test
...
When the service was restarted with these additions, the ingress-nginx-controller pod stopped receiving traffic.
The externalP is set to the IP of docker-dev-ucsb-1
, but currently only one instance of ingress-nginx
is running on docker-dev-ucsb-2
.
Restarting the service with externalTrafficPolicy
commented out results in traffic being routed to the NIC again.
It's possible that packets are being dropped
because ingress-nginx
is not running on docker-dev-ucsb-1
, as mentioned here:
The recommended way to preserve the source IP in a NodePort setup is to set the value of the externalTrafficPolicy field of the ingress-nginx Service spec to Local (example).
Warning
This setting effectively drops packets sent to Kubernetes nodes which are not running any instance of the NGINX Ingress controller. Consider assigning NGINX Pods to specific nodes in order to control on what nodes the NGINX Ingress controller should be scheduled or not scheduled.
I'll run a test during k8s dev scheduled down time to confirm this. ingress-nginx
will be configured to be scheduled to run on docker-dev-ucsb-1
, then we can see if the externalTrafficPolicy=Local
fix will route traffic to it.
The above link mentioned other methods to preserver sourceIP that may be easier to maintain than changing the traffic policy.
BTW - ports 80/443 are now active on both dev k8s and prod. I'll writeup docs on how this was configured.
Thanks for trying it out and for the report @gothub.
This ticket is a continuation of the discussion from https://github.com/DataONEorg/gnis-deployment/issues/5.
The NGINX Ingress Controller will be configured and tested to expose ports 80/443 to external traffic (external to the cluster).