Open geerlingguy opened 3 years ago
I just created the following issue upstream, to see if anyone has further ideas to help figure out why the initial idea didn't work: https://github.com/kubernetes/kubernetes/issues/95492
I'm going to run another cluster and get a time series graph of how long it takes per Job:
watch -n5 "kubectl get jobs -l type=50k --field-selector status.successful=1 | wc -l | awk -v date=\", \$(date)\" '{print \$1, date}' >> result.csv"
I also asked Linode via support ticket here: https://cloud.linode.com/support/tickets/14647575 (have to be logged in as me to view ;).
@geerlingguy thats awesome that you keep track of it and handing it with your finding to kubernetes mainterners. would be awesome to see maybe a screenshot of what linode answered to your question (I guess the final conclusion after investigation or something)
@markrity - Don't worry, I'll keep things updated here :)
So to sum up some of the things I've learned:
Here are some of the graphs and the CSV raw data used to build them, using watch -n5 "kubectl get jobs -l type=50k --field-selector status.successful=1 | wc -l | awk -v date=\", \$(date)\" '{print \$1, date}' >> result.csv"
to dump the data into a CSV.
--leader-elect=true
--leader-elect=false
--leader-elect=true
--leader-elect=false
And here's a zip file containing all the CSV files:
The fact that the Linode and 2 GB RAM graphs on my Mac line up so perfectly, with the inflection point around 3,000 Jobs, makes me strongly suspect the Linode master has 2 GB of RAM by default.
I tried one of two ways:
In both cases, the Linode clusters seemed to hit some sort of wall around 3,000-5,000 Jobs. My local cluster died (see #3) just under 3,000 Jobs.
If I create a batch, then delete that batch (and delete all the orphaned Pods from that batch—for some reason deletion propagation wasn't happening, in either 1.18 or 1.19—it didn't seem like any owner references were set on the Job Pods), then move on, I can get up to 50,000 Jobs (and likely beyond).
So my question is this: why does it seem like the scheduler starts to fall over at such a low number of Jobs? Surely there are clusters out there where people don't garbage collect Jobs and there are many, many thousands of Jobs, right? (And I'm not talking about CronJobs here, just Jobs).
I think I might open an issue in the K8s repo and see if there's any more official light to shine on this, since the docs are completely silent on any warnings about trying to run thousands of Jobs.