Closed da-ekchajzer closed 1 year ago
@samuelrince I would be interested in your opinion.
I have deep dived a little bit more using the original spreadsheet. Looks like making one logarithm-like function model per CPU "model" or "family" (Platinum, Gold, Silver, etc.) could be a good idea?
Here the red one is the line 6 from the spreadsheet, corresponding to c5.metal*. I don't know what the star means here neither why the red curve stands out that much. Looks like it should be on the other graph.
@da-ekchajzer, let me known what you think of this approach. I guess if can have the CPU model name we will have a better estimation of what the power consumption profile should look like for new CPU models.
Also I don't understand the second table, it's not the power consumption per CPU architecture right?
Architecture Min Watts (0%) Max Watts (100%) Skylake 0.6446044454253452 4.193436438541878 Broadwell 0.7128342245989304 3.6853275401069516 Haswell 1.9005681818181814 6.012910353535353 EPYC 2nd Gen 0.4742621527777778 1.6929615162037037 Cascade Lake 0.6389493581523519 3.9673047343937564 EPYC 3rd Gen 0.44538981119791665 2.0193277994791665 Ivy Bridge 3.0369270833333335 8.248611111111112 Sandy Bridge 2.1694411458333334 8.575357663690477
The second table represents the medium consumption of a server depending on the CPU family (also called Architecture).
My idea is to generate server consumption profiles at first per CPU family, until we gather data on specific CPU (and other components) to make consumption profile based on more precise data (number of core, CPU model, ā¦).
But either way we should come up with a generic way of generating a consumption profile from an workload
object I mention above. As I saw in your graph you use a linear approach to connect a point with its successor. IMHO the approach is limited :
When few points are given (in the case of the second table for instance) the consumption profile will be an affine function
Example for Skylake family
"workload":{
"0":0.6446044454253452,
"100": 4.193436438541878
}
consumption_profil(x) = ((100 - 0) / (4.193436438541878 - 0.6446044454253452)) * x + 0.6446044454253452
Yet, we know the consumption profile is not linear.
Besides, with this approach, we won't come up with a function defined by its coefficient but with a set of affine function connecting a point to another.
Cloud carbon footprint uses an affine equation (as saw above) to come up with an average watts' consumption Average Watts (x) = Min Watts + x * (Max Watts - Min Watts)
I think with the AWS data from teads and the future data we'll have we can be more ambitious and generate more precise functions.
What do you think of using a logarithmic regression
(which I am not familiar with) based on the workload object and previous consumption_profil ?
Does this make sense to you ?
I think a logarithmic function is good enough to model the CPU power consumption given the workload, and I understand that often we will only have the min (idle) and max (100%) power consumption.
The idea is to have a more precise model if we can have access to the CPU model. Precisely in Intel case, if we know the CPU is either Xeon Platinum, Xeon Gold, Xeon Silver we can use a different base model to compute the "Final model", based on min and max power consumption.
Here is an example:
We receive from the API, both the CPU model and workload as following:
{
"cpu": {
"model": "Intel Xeon Platinum 8124M"
},
"workload": {
0: 51 <<< Power consumption in idle state (in W)
100: 413 <<< Power consumption in 100% state (in W)
}
}
(Maybe not the actual json fields here)
Given that we know it is a Xeon Platinum CPU and we can use a more precise model previously fitted on Xeon Platinum CPU data only, see the following:
Here the white curve called "Platinum model" is a power consumption model inferred from all power consumption curves for Xeon Platinum CPUs.
We can then build a second model called "Final model" using the "Platinum model" and min and max power consumption values. We build the following model in pink:
(In blue it is the actual cpu power consumption model)
In the case where we don't have the CPU model, but only min/max workload, we can use a default model (still a log function) built from the whole power consumption datasets. This method will give less precise values but still better than an affine function.
The log function I use to fit in the data is:
power_consumption(workload) = a * ln(b * (workload + c)) + d
I hope it is more clear what my idea is. Let me know if you think that it can be useful or if it is totally overkill.
It is exactly what I was thinking but couldn't explain it so clearly. Thank you.
I think we should work with CPU family (architecture) / core number
rather than the commercial naming (Xeon, ā¦) for several reasons :
Could you explain the process with the equations to ease the implementation part. For example how do you define a,b,c,d
in your equation ?
Also, could we apply this mechanism when more than 2 values are given (0%, 50%, 100% for instance) ? Does it make sense ?
1 - Input data
{
"cpu" :{
"family":"skylake",
"nb_core":8
}
"workload":{
0: 51, <<< Power consumption in idle state (in W)
50: 293, <<< Power consumption in 50% state (in W)
100: 413 <<< Power consumption in 100% state (in W)
}
}
2 - Look for equivalent consumption profile
If exist : Search for equivalent consumption profile with same family and core_number else if exist : Search for equivalent consumption profile with same family else : Use default consumption profile and go to (4)
3 - Infer the consumption profile for the current type of CPU
ā What magic are you doing here ?
4 - Generate the consumption profile equation from 1) the inferred curve and 2) the input data ā What magic are you doing here ?
power_consumption(workload) = a * ln(b * (workload + c)) + d
Could you explain the process with the equations to ease the implementation part. For example how do you define a,b,c,d in your equation ?
The implementation is really easy, it is just using scipy.optimize.curve_fit
function to create all the previous models. Basically, it is an optimization problem where we try to fit a function (power_consumption(workload) = a * ln(b * (workload + c)) + d
) to some data points. If we have multiple data points, we can just fit one model per CPU and then merge all the models into one, by averaging the parameters (a, b, c, d) of the models. I have only set the following constraints to parameters:
The optimization done in curve_fit
to find a, b, c and d is least squares approximation.
I can provide you the POC in a notebook if you want? (I have to clean it a bit first).
Also, could we apply this mechanism when more than 2 values are given (0%, 50%, 100% for instance) ? Does it make sense ?
The optimization process described above can work with 2 or more values. With more values we can expect a higher precision. Depending on the number of data points, we have it can be useful to start the optimization process from a base model (like the Platinum model) in that way we start with parameters that are already defined and we can just "try to shift the curve" until it meets min workload and max workload for instance. But I think that if we have 3 or more data points in input we don't need that first step as the model we try to fit is very simple and regular.
I think we should work with CPU family (architecture) / core number rather than the commercial naming (Xeon, ā¦) for several reasons : ...
I have tried to put the family (or architecture?) in front of each CPU, tell me if you see an error, but I think it is OK. It gives me that:
CPU model | CPU family |
---|---|
Intel Xeon E-2278G | Coffee Lake |
Intel Xeon E3 1240v6 | Sandy Bridge |
Intel Xeon E5-2660 | Sandy Bridge |
Intel Xeon E5-2686 v4 | Broadwell |
Intel Xeon Gold 5120 | Skylake |
Intel Xeon Gold 5218 | Cascade Lake |
Intel Xeon Gold 6230R | Cascade Lake |
Intel Xeon Platinum 8124M | Skylake |
Intel Xeon Platinum 8151 | Skylake |
Intel Xeon Platinum 8175M | Skylake |
Intel Xeon Platinum 8252C | Cascade Lake |
Intel Xeon Platinum 8259CL | Cascade Lake |
Intel Xeon Platinum 8275CL | Cascade Lake |
Intel Xeon Silver 4110 | Skylake |
Intel Xeon Silver 4114 | Skylake |
Intel Xeon Silver 4210R | Cascade Lake |
Intel Xeon Silver 4214 | Cascade Lake |
(I've removed the ones with * for now because I don't understand why they look so weird on the graphs...)
Given that classification I can plot all CPU power consumption curves per family.
Only one CPU for both Coffee Lake and Broadwell so I haven't plotted them.
You see that on each graph we clearly have different CPU profiles even though they are from the same family/architecture. And on the first 2 graphs we can see that the Platinum ones are always close together at the top, then Gold ones, and then Silver ones at the bottom.
That is why I first grouped them by CPU "model" (Platinum, Gold, Silver, E3, E5, E) because when you plot them together there profiles look very similar even though they are not from the same family/architecture or launch year.
If we take into account the number of CPU cores in addition of the CPU architecture it is still not really satisfying:
(The number after the CPU full name is the number of cores, e.g. "Intel Xeon Platinum 8275CL 24" cores)
You have CPU with less cores over CPU with more cores and vice versa.
Let me know of what you think, maybe it is a subject to discuss in a meeting? But in the end, at this stage I am only convinced by grouping CPUs by their "model". Of course, if we have more data we can then consider architecture and number of cores, but only within the same CPU model group.
Thank you for the explanations.
From your work it seems very clear that the CPU model is the best strategy. As you mentioned it would be nice to find data on other CPU (AMD for instance) to validate this.
@github-benjamin-davy since this strategy is based on your data I think your opinion would be precious.
I think a Jupyter notebook is a good input for the implementation if you can provide it.
I thought we could begin to implement it as a route (POST /cpu/consumption_profil) which takes a CPU object and a workload object and returns the coefficient a, b, c of the function.
The usage of the consumption profile will be implemented in #87 and #88
It makes me think that we should add a model CPU attribute :
I will modify #82 to make it possible to complete family and model from cpu name.
Hello here, I'll try to catch up on the discussion,
@samuelrince the * is simply used as a way to exclude some lines from the VLOOKUP in the spreadsheet (some lines refer to underclocked machines).
I would say that the most essential characteristic of the CPU is its TDP which should most of the time be close to the max consumption from what I've experienced (however two CPUs with the same TDP might not have the exact same behavior). As you have seen, CPUs from the same family can have very different power consumption for the same number of cores (depends on voltage & frequency).
Hey @da-ekchajzer you can take a look at this notebook as a working implementation.
Implemented as a router for CPU in https://github.com/Boavizta/boaviztapi/pull/113
Problem
A consumption profile is a function which associate a workload to an electrical_consumption :
consumption_profil(workload) = electrical_consumption
This continues function will be generated from punctual measures on different workload for a given configuration. The punctual measure could come from our measurement or from secondary sources.
We want to provide a way to generate continuous consumption profiles (function) from those punctual measures. Such process could be used for device or components usage impacts evaluation.
Solution
We should set up a regression process. We call regression the process of defining a continuous relationship (function) between workload and electrical_consumption based on a punctual measurement.
Regression shouldn't be linear. From what we have seen, consumption profile follow a logarithmic rule.
This might be a problem when only two points are given (min 0%, max 100% for instance) since we don't want a linear distribution. We could use existing consumption profile in the regression process.
Input value
Format
We should have this type of input format :
Data example
Example for AWS server CPU from TEADS Link :
Example from specpower agregated by Cloud carbon footprint : Link
Output value
A function described by its coefficients