Open sparshev opened 2 months ago
The change was prepared - in theory it could work with any dedicated hosts - but primarily was aimed to mac.
New version of Fish with dedicated hosts management was issued (v0.7.3), but there is another thing I forgot - is quotas verification. And of course I saw an error during the resource allocation when quota said "nope". Need to add it as a warning for the Fish startup and use minimal value either this is a pool max or quota value for the dedicated host maximum.
Also checked the price - keeping the mac alive for 24h with dedicated hosts management costs ~10% of the regular price (mac2-m1ultra.metal public price is $5/h, so it's ~$12 vs $120). Good deal, but there is still a room for better optimization:
work_cron
+ work_duration
- to define the schedule of the working hours, soscrubbing_delay
will be minimal during the off-hours time. This will help during the night and weekend time to save money.keep_amount
to keep some capacity during the off-hours with regular or increased scrubbing_delay
available for potential night-time workload.scrubbing_capacity
following the cubic function (0.0-1.0) to keep scrubbing_delay
low if we have the capacity of non-allocated hosts and only when we reached the capacity limit - scrubbing delay comes into full power helping to reuse the hosts. If scrubbing_delay=10m
, max=100
and scrubbing_capacity=20
:
scrubbing_delay
will be close to 0 min, because we have an untouched capacityscrubbing_delay
= 1m15s, because we're on a half-way to scrubbing_capacity
threshold and cube will be 0.125scrubbing_delay
= 4m13s, because we used 75% capacity and cube of it is 4.21875scrubbing_delay
= 10 min, since we reached the scrubbing_capacity
thresholdscrubbing_delay_budget
allows to set maximum time the host will spend in scrubbing delay state, when it's exhausted the host starts the scrubbing right away till the end of the host life. Probably we will need some sort of exclusion here in case the unallocated capacity is 0 and there no more hosts with a budget and we need some minimal capacity available...
AWS driver need to be able to manage the dedicated hosts - primarily for Mac machines (due to their inability to run as regular instances as with Lin/Win), but overall in the future it would be able to manage different types of the dedicated hosts and implement a couple of important cost optimizations (bring Mac cost on par with the rest of the AWS resources).
Why?
There are a couple of issues with the dedicated hosts of Mac:
Ideally will be to have an opt-out of the mandatory scrubbing process, but AWS put this feature in the backlog. There is an alternative was proposed by AWS: root disk replacement during reboot - but it has a couple of flaws:
Solution
A tricky optimization will be implemented as a part of this task: saving budget with scrubbing procedure (it was discussed with AWS and we've got an approval for that workaround): The dedicated hosts scrubbing procedure will be used to keep the machines allocated but off when they are not needed. In general - when the dedicated host is not used for some certain amount of time (like 5 minutes) it:
This way it's possible to maintain the pool of dedicated machines with relatively low cost really close to the regular Lin/Win instances and keep the same dynamic workflow. Potentially in the future, when AWS will implement opt-out scrubbing we will be able to utilize the existing machines even more, but that will be done as a separated acivity.