geerlingguy / 50k-k8s-jobs

50,000 Kubernetes Jobs for 50,000 Subscribers
https://www.youtube.com/watch?v=O1iEBzY7-ok
MIT License
45 stars 5 forks source link
ansible jobs k8s kubernetes video youtube

50,000 Kubernetes Jobs for 50,000 Subscribers

CI

I just passed 50,000 subscribers on my YouTube channel, so I thought I'd find a fitting way to celebrate.

For 10,000 subscribers, I ran 10,000 K8s Pods on an Amazon EKS cluster; you can find a video detailing that here: 10,000 Pods for 10k Subscribers.

So for 50,000, I was doing a few calculations and found that I'd have to pay a few hundred bucks (at least!) just for the time on an EKS cluster with enough workers to run 50,000 Pods simultaneously. So that was right out.

Instead, I'm building this Ansible playbook to automate the process of running 50,000 individual Jobs on a Kubernetes cluster!

And of course, I made a video detailing the process and thanking my subscribers: 50,000 Kubernetes Jobs for 50k Subcribers.

Local Testing

Make sure you have Ansible and Kind installed, then run the following:

  1. kind create cluster
  2. pip3 install ansible molecule[docker] yamllint ansible-lint openshift
  3. ansible-galaxy install -r requirements.yml
  4. molecule converge

When you're finished, run kind delete cluster.

Running on a Production Cluster

I guess this is a great opportunity to thank this project's sponsor, Linode; they not only sponsored my '50,000 Kubernetes Jobs for 50,000 Subscribers' video, but they also did two other very kind things:

  1. They gave me some credit to try out this project on their Linode Kubernetes Engine.
  2. They gave me a special link I can share with you to get a $100 60-day free credit on your own new Linode account!

Seriously! Go try out Linode using my link, and you can take some new infrastructure for a spin for free!

Anyways, since I knew I'd build this cluster on Linode, I went ahead and did the following:

  1. Created my Linode account (use this link for free credit!).
  2. Created a new Kubernetes Cluster.
  3. Added 10 8GB Linodes to the Cluster.
  4. Waited for the Cluster and all Nodes to boot.
  5. Downloaded the Kubeconfig file from Linode's Kubernetes UI.
  6. Told Ansible about the Kubeconfig with: export K8S_AUTH_KUBECONFIG=~/.kube/linode.yaml
  7. Ran molecule converge

That runs with all the defaults in the playbook. If you want to override them, the easiest way is to run ansible-playbook and pass extra variables:

ansible-playbook main.yml --extra-vars="{'batch_size':500,'total_count':50000}"

Check on how many jobs have completed by monitoring:

kubectl get jobs -l type=50k --field-selector status.successful=1

Note: For efficiency's sake, each batch of jobs is deleted after it successfully runs (otherwise there seems to be hard limit of how many Jobs/Pods will remain present on the cluster, and the scheduler grinds to a halt). If you want to leave all jobs in place, add the extra variable 'inflight_cleanup':true.

Disabling Inflight Cleanup of Jobs

Because I encountered issues when running more than 3,000-5,000 Jobs in a given cluster, I set up the playbook to run a batch of Jobs, then delete all those Jobs (and their orphaned Pods), then move on to the next batch. This allowed the playbook to deploy all 50,000 Jobs.

But you can bypass that 'inflight cleanup' of each batch by setting inflight_cleanup: false in the playbook extra vars, for example:

ansible-playbook main.yml --extra-vars="{'batch_size':500,'total_count':50000,'inflight_cleanup':false}"

Note that this configuration has not yet been successfully used to deploy more than 4,000-5,000 Jobs on a single cluster, but it does allow you to dump a LOT of Jobs into a cluster and see how far it can get.

Author

This project was created by Jeff Geerling, author of Ansible for DevOps and Ansible for Kubernetes, in support of a video celebrating 50,000 subscribers on his YouTube channel.