MatterMiners / cobald

Cobald is an Opportunistic Balancing Deamon
https://cobald.readthedocs.io
MIT License
11 stars 12 forks source link

Fix stepwise controller documentation #93

Closed mschnepf closed 2 years ago

mschnepf commented 3 years ago

The supply is defined by the resources and is usually constant, as it is in the current documentation. Therefore, it can happen that the demand increase very slow. It is better to set the new demand based on the current demand.

giffels commented 3 years ago

@mschnepf I would suggest to wait for @maxfischer2781's review before merging.

mschnepf commented 3 years ago

@RHofsaess the demand has a range of 0 to max float. Usually, a standardized is used to define a range demand >1 to get at least one resource. If the latency of the resource requests is low, demand and supply will be similar. If the latency of the resource requests is high (e.g. long job queue in an HPC center), demand and supply will differ more over time.

The change in demand via the multiplication factor in line 123 will be used when the supply is bigger than 100. Therefore, the change will be bigger than 1 (change by the other controller) as long as the demand / supply is higher than 10. This should be usually the case.

maxfischer2781 commented 3 years ago

Haven't forgotten about this, just didn't have time to reply properly yet. Stay tuned.

eileen-kuehn commented 3 years ago

I do not really get what is the invalid part in the documentation. For me it seems perfectly fine to define this via supply. @mschnepf could you explain what is the issue and what you would expect otherwise? I also don't see an issue with slowly increasing demands that are dependent on maybe slowly increasing/decreasing supply.

The main issue I see is that you might artificially raise/lower the demand completely independent of reality aka supply. In some cases supply will not be able to match the demand at all. What is it, that you intend? Rising priority?

maxfischer2781 commented 3 years ago

@mschnepf Can you clarify why you think depending on the supply is wrong? Not needlessly increasing the demand when the resource cannot actually satisfy it doesn't seem wrong to me.

I'm currently against making this change. The example is in effect a fixed/ratio overbooking; it might or might not apply to the HEP usecase, but it seems like a desirable strategy for many situations.

mschnepf commented 3 years ago

TARDIS requests more resources when the demand >= supply+size_of_resource. In the current documentation of the stepwise controller, the demand is the supply plus an offset. If this offset is smaller than the size of a resource no further resources will be requested. In such a case, the demand and supply stay at that level forever.

COBalD config:

from cobald.controller.linear import LinearController
from cobald.decorator.limiter import Limiter
from cobald.decorator.logger import Logger
from cobald.decorator.standardiser import Standardiser
from tardis.resources.poolfactory import create_composite_pool
from cobald.controller.stepwise import *

@stepwise
def control(pool: Pool, interval):
    return 10 

# additional rules above specific supply thresholds
@control.add(supply=1)
def start(pool: Pool, interval):
    if pool.utilisation < 0.45:
        return pool.supply - 1 
    elif pool.allocation > 0.9:
        return pool.supply + 1

@control.add(supply=200)
def linear(pool: Pool, interval):
    if pool.utilisation < 0.8:
        return pool.supply - 1
    elif pool.allocation > 0.9:
        return pool.supply + 1

pipeline = control.s(interval=1) >> Limiter.s(minimum=1)  >> Logger.s(name='changes') >> create_composite_pool('tardis_config.yml')

TARDIS config:

BatchSystem:
    adapter: FakeBatchSystem
    allocation: 1.0
    utilisation: 1.0
    machine_status: Available

Sites:
    - name: test-site
      adapter: FakeSite
      quota: 1000

test-site: 
    StartupCommand: startVM.py
    api_response_delay: !RandomGauss
                        mu: 0.1
                        sigma: 0.0
    resource_boot_time: !RandomGauss
                        mu: 1 
                        sigma: 0
    MachineTypes:
        - short
    MachineTypeConfiguration:
        short:
            Walltime: '1:00:00'
            delay: 2
    MachineMetaData:
        short:
            Cores: 8
            Memory: 110
            Disk: 100

I think the basic example in that documentation should be with demand, similar to the liner controller. In addition, an example with supply can be shown.

eileen-kuehn commented 3 years ago

TARDIS requests more resources when the demand >= supply+size_of_resource. In the current documentation of the stepwise controller, the demand is the supply plus an offset. If this offset is smaller than the size of a resource no further resources will be requested. In such a case, the demand and supply stay at that level forever.

One short question: In the context of TARDIS this might be perfectly valid. But this is the documentation of COBalD explaining the basic concept of the LinearController. For this purpose I don't see any issues. You might consider extending the documentation of TARDIS instead.

maxfischer2781 commented 3 years ago

Keeping in mind the motivation for this change, would it be a compromise just to suggest a larger supply increase? The initial idea of the example was to show "10% or at least 10", not the "10% or at least 1" that actually made it into the docs.

        @control.add(supply=10)
         def quantized(pool: Pool, interval):
             if pool.utilisation < 0.5:
                 return pool.supply - 10
             elif pool.allocation > 0.5:
                 return pool.supply + 10
mschnepf commented 3 years ago

TARDIS uses a Standardiser with granularity= number of CPU-cores. Therefore, further resources are only requested if demand>supply+granularity. In the case of the shown example of the stepwise controller, the demand would not increase if the granularity is bigger than 1 while the supply is below 200. I created a PR for TARDIS https://github.com/MatterMiners/tardis/pull/209 which removes the granularity per default.

Therefore, I'm fine with the original example of the stepwise controller. Is a comment at the standardized useful to describe this behaviour?