adobe / aquarium-fish

Your best secure distributed heterogeneous dynamic compute resource manager for CI
Other
7 stars 2 forks source link

AWS driver: dedicated hosts management #60

Open sparshev opened 2 months ago

sparshev commented 2 months ago

AWS driver need to be able to manage the dedicated hosts - primarily for Mac machines (due to their inability to run as regular instances as with Lin/Win), but overall in the future it would be able to manage different types of the dedicated hosts and implement a couple of important cost optimizations (bring Mac cost on par with the rest of the AWS resources).

Why?

There are a couple of issues with the dedicated hosts of Mac:

  1. 24h minimum allocation - you can't release the dedicated host until it's 24h old. It's a restriction by Apple license.
  2. Mandatory scrubbing - even if you have no plan on releasing the dedicated host, each time you stop or terminate the instance that utilizes dedicated host it goes to scrubbing (cleaning) process that takes ~1-2h. There is no opt-out of that process right now, which will be useful if you have no plans on recently updating the images or want to preserve the local disk for caching for example. One useful benefit of this issue - we're not paying for dedicated host while it's scrubbing.

Ideally will be to have an opt-out of the mandatory scrubbing process, but AWS put this feature in the backlog. There is an alternative was proposed by AWS: root disk replacement during reboot - but it has a couple of flaws:

Solution

A tricky optimization will be implemented as a part of this task: saving budget with scrubbing procedure (it was discussed with AWS and we've got an approval for that workaround): The dedicated hosts scrubbing procedure will be used to keep the machines allocated but off when they are not needed. In general - when the dedicated host is not used for some certain amount of time (like 5 minutes) it:

This way it's possible to maintain the pool of dedicated machines with relatively low cost really close to the regular Lin/Win instances and keep the same dynamic workflow. Potentially in the future, when AWS will implement opt-out scrubbing we will be able to utilize the existing machines even more, but that will be done as a separated acivity.

sparshev commented 2 weeks ago

The change was prepared - in theory it could work with any dedicated hosts - but primarily was aimed to mac.

sparshev commented 6 days ago

New version of Fish with dedicated hosts management was issued (v0.7.3), but there is another thing I forgot - is quotas verification. And of course I saw an error during the resource allocation when quota said "nope". Need to add it as a warning for the Fish startup and use minimal value either this is a pool max or quota value for the dedicated host maximum.

sparshev commented 6 days ago

Also checked the price - keeping the mac alive for 24h with dedicated hosts management costs ~10% of the regular price (mac2-m1ultra.metal public price is $5/h, so it's ~$12 vs $120). Good deal, but there is still a room for better optimization: