atlassian / escalator

Escalator is a batch or job optimized horizontal autoscaler for Kubernetes
Apache License 2.0
662 stars 59 forks source link

WIP: Support ASG with zero size #148

Closed skydiator closed 5 years ago

skydiator commented 5 years ago

@Jacobious52 This is the initial work to support ASG with zero size. Most of my tests were done against a real EKS environment and I have found no issues with the implementation so far.

skydiator commented 5 years ago

going to troubleshoot and resolve the failed tests.

Jacobious52 commented 5 years ago

Hi @skydiator, we haven't heard from you in a while, and were looking to get progress on this feature as we have use cases for it as well. We have a bunch of additions and changes we'd like to make especially adding scaling down and performant scale up.

We'd like to expand on your work and branch off it in order to keep your contributions so far. Let us know if you have any concerns with this.

skydiator commented 5 years ago

@Jacobious52, Sorry I'm too busy with work and family. I've made quite a bit of changes to fit the logic. I'm at the point of allowing the Controller live fetch the CPU/Mem from different type of instances and calculate the capacity needed for the total pending pods. With this approach, we don't have to worry about caching things. There's still some work needed to be done.

Jacobious52 commented 5 years ago

Thanks for the quick reply. No worries, we have the resources right now dedicated to getting this off the ground. So we can do the MVP work for this now and run with what you've given us for now.

We decided that it would work best if we merged this PR into a separate branch for this feature, and then we can branch off that to finish the features. This way you still have your commits contributed and can provide extra PRs if you want, until it is merged into master.

skydiator commented 5 years ago

@Jacobious52 oh okay. I'm fine with that. Do you need me to commit what I've so far?

Jacobious52 commented 5 years ago

@skydiator, yes please. Just push what you've got and we can work from there :)

skydiator commented 5 years ago

@Jacobious52 After some clean up, this is what I've got so far. Basically what I'm trying to do was perform a fetch of instance type detail for every RunOnce loop to ensure the instance type detail is up to date. By using the Instance type details, we can calc the capacity & delta needed for the pending pods. The calc of capacity & delta and test cases still need some work. Hope this helps.