ciao-project / ciao

Ciao - Cloud Integrated Advanced Orchestrator
Apache License 2.0
215 stars 51 forks source link

resource estimation #101

Open tpepper opened 8 years ago

tpepper commented 8 years ago

User workloads will request some amount of resource, eg: 4 vCPU, or 8GB RAM, etc. For long lived or frequently run workloads, comparing the requested resource amount versus actual usage allows us to establish trends and act on that information for better cloud performance. Eg:

The workload may not actually use all of the requested resource, in which case knowing this trend enables us to more successfully overcommit.

The workload may use all of the requested resource, in which case the user could be informed that allocating more resource may allow their workload to run more efficiently.

We currently report resource usage over time from launcher per workload to controller, but don't do analysis and don't feed that analysis into user facing info, don't use the data to impact scheduler placement, and don't use the data to trigger opportunistic actions at launcher level (eg: there are all manner of technologies available to proactively implement QoS or opportunistically reclaim unused resources).

tpepper commented 8 years ago

As with issue #100 we likely should add an optional config parameter for memory overcommit. Unlike CPU which is renewable and mostly non-fatal (ie: things run, just slower), workloads will page in memory on use. An 8GB VM will not necessarily consume all of that 8GB. Tracking workloads over time can allow us to measure real versus requested resource usage. A RAM overcommit knob would allow the gap of unused resource to be more safely overcommitted to other workloads. The risk with RAM overcommit is workloads failing when paging fails, or workloads running from swap instead of RAM and the horrible performance that comes with it. Ie: RAM overcommit without the feedback loop of guidance from a resource estimation analysis system (and without any evacuation/migration support) is dangerous.