epam / cloud-pipeline

Cloud agnostic genomics analysis, scientific computation and storage platform
https://cloud-pipeline.com
Apache License 2.0
144 stars 58 forks source link

Add grace period support to grid engine autoscaler #3476

Closed tcibinan closed 3 months ago

tcibinan commented 3 months ago

Background

Currently, grid engine autoscaler scales down workers if they have invalid states. In case a node becomes unavailable the autoscaler will scale it down immediately. This approach doesn't give node a chance to recover.

Approach

Grid engine autoscaler should be able to postpone unavailable nodes termination in the same manner as in #3297.