ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
251 stars 33 forks source link

Set-up "Instance health checks" with graceful shut-down #154

Open ababaian opened 4 years ago

ababaian commented 4 years ago

There are edge-cases of instance errors in which say the serratus-align container is not doing any meaningful work (measured by CPU%) and the shut-down procedures fail to catch and gracefully close the instance and container. We rely on ec2-terminate for this graceful shutdown but having a redundancy of sudo shutdown -h now or eqiuvalent function would be really nice.

One way to implement this is to add "health checks" for the instances, that is if CPU usage i say <5% for a sustained 5-10 minutes, the instance is terminated from outside. There are quite a few cases of serratus-align, serratus-dl and serratus-merge in which a few stragglers are left 'spooling' after scale-in or in the background during a run. This in theory will be a catch-all for several errors to reduce inefficiencies.

From serratus/containers/worker.sh

          shutdown)
            (
                flock 200

                echo "  Shutting down instance"
                # TODO: change to shutdown (see below)
                aws ec2 terminate-instances \
                 --region us-east-1 \
                 --instance-ids $INSTANCE_ID

                sleep 300

                # TODO: Add a redundancy for shutdown
                #       to work form inside the container
                #
                # Secondary back-up -- shutdown instance
                # (set to "stopped" state" if terminate fails)
                # yum -y install sudo shadow-utils util-linux
                # sudo shutdown -h now
                # sleep 300

                false
                exit 0

            ) 200> "$BASEDIR/.shutdown-lock"