Closed JacobValdemar closed 7 months ago
Thanks for your wish to contribute !
id | manufacturer | CASE.case_type | year | vcpu | platforme_vcpu | CPU.units | CPU.core_units | CPU.name | CPU.manufacturer | CPU.model_range | CPU.family | CPU.tdp | CPU.manufacture_date | instance.ram_capacity | RAM.capacity | RAM.units | SSD.units | SSD.capacity | HDD.units | GPU.name | GPU.units | GPU.TDP | GPU.memory_capacity | POWER_SUPPLY.units | POWER_SUPPLY.unit_weight | USAGE.instance_per_server | USAGE.time_workload | USAGE.use_time_ratio | USAGE.hours_life_time | USAGE.other_consumption_ratio | USAGE.overcommited | Warnings | configuration.disk.units | configuration.disk.type | configuration.disk.capacity |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a1.2xlarge | AWS | rack | 2018 | 8 | 16 | 1 | 16 | Graviton | Annapurna Labs | Graviton | Graviton | 40 | 2018 | 16 | 16 | 2 | 0 | 0 | 0 | 2;2;2 | 2.99;1;5 | 2 | 50;0;100 | 1 | 35040 | 0.33;0.2;0.6 | 0 | 0 |
If the CPU of the instance is not listed in https://github.com/Boavizta/boaviztapi/blob/dev/boaviztapi/data/crowdsourcing/cpu_specs.csv it is recommended to add it if you have some information on the CPU. If not, the completion process will be used each time the instance is requested.
When an information is not known, leave it empty, the API will complete it.
Most configuration information should be reported in AWS documentation or third party services. See for c5.12xlarge :
We don't have an automatic process to fulfill our database, if you wish to do it automatically, feel free to provider a Jupyter notebook.
Hops it answers all your questions. Feel free to ask others.
@da-ekchajzer thanks for the description! So basically I should just leave empty if I can't find the data?
I have looked at the existing data to see if I can understand how existing data is extrapolated from external data sources. However, it seems that there is some errors in existing data. Can that be true? If there are errors in existing data, should I then fix them? Also it seems that for many instances, defaults are hardcoded into the datasheet, shouldn't they be omitted as you said? For example, POWER_SUPPLY.units
is specified as 2;2;2
for all rows.
Also, should Previous Generation Instances be removed? Such as c1.medium and c1.xlarge?
I have tried to create a table describing where to get data for the columns. Does it seem right? All the places I have put just ? means I am currently unable to figure out where/how to get the data. |
column | constant | instances.vantage.sh data export | other source |
---|---|---|---|---|
id | "API Name" | |||
manufacturer | AWS | |||
CASE.case_type | rack | |||
year | https://instancetyp.es/ | |||
vcpu | "vCPUs" | |||
platforme_vcpu | ? | |||
CPU.units | ? | |||
CPU.core_units | ? | |||
CPU.name | "Physical Processor", extrapolated | |||
CPU.manufacturer | "Physical Processor", extrapolated | |||
CPU.model_range | "Physical Processor", extrapolated | |||
CPU.family | aws webpage instance type description | |||
CPU.tdp | lookup crowdsourcing/cpu_specs | |||
CPU.manufacture_date | same as year? or cpu_specs.release_date? | |||
instance.ram_capacity | "Instance Memory" | |||
RAM.capacity | ? "Instance Memory" !=/= RAM.capacity * RAM.units / usage.instance_per_server ? | existing data doesn't seem valid if | ||
RAM.units | ? | |||
SSD.units | "Instance Storage", extrapolated | aws webpage instance type table | ||
SSD.capacity | "Instance Storage", extrapolated | aws webpage instance type table | ||
HDD.units | ? | |||
HDD.units | ? | |||
GPU.name | "GPU model" | |||
GPU.units | "GPUs" | |||
GPU.TDP | ? | |||
GPU.memory_capacity | BAD or "GPU memory" / "GPUs" or "GPU memory" | |||
POWER_SUPPLY.units | 2;2;2 why? | |||
POWER_SUPPLY.unit_weight | 2.99;1;5 why? | |||
USAGE.instance_per_server | metal.vCPU / "vCPUs", does apparently not go for all types | |||
USAGE.time_workload | 50;0;100 ? why?? is it default;min;max? | |||
USAGE.use_time_ratio | 1 | |||
USAGE.hours_life_time | 35040 (4 years, but many has manufacture_date < 2019?) | |||
USAGE.other_consumption_ratio | 0.33;0.2;0.6 (PUE?) source? what is x;y;z ? | |||
USAGE.overcommited | 0 or 1, why which? | |||
Warnings | ||||
configuration.disk.units | ? | |||
configuration.disk.type | ? | |||
configuration.disk.capacity | ? |
So basically I should just leave empty if I can't find the data?
Yes and no :). I was a little to quick on my response. I would say as a first step yes, and then we will discuss on how to account for unknown data (either default value or range)
To address uncertainty in situations where a value is not known or cannot be determined, we employ a default value accompanied by a minimum and maximum range. These parameters are utilized in the impact calculation procedure to evaluate a spectrum of potential impacts, including average, minimum, and maximum values. When specifying a range for a value within a CSV file, we format it as follows: value;min;max.
I have looked at the existing data to see if I can understand how existing data is extrapolated from external data sources. However, it seems that there is some errors in existing data. Can that be true?
It is definitely possible that some data are wrong either because of inadvertent errors in the manual process or because data have changed. Feel free to generate an other file that follow the same format.
If there are errors in existing data, should I then fix them?
Yes feel free ! You can also (if possible) add a source row to track where the data come from.
Also it seems that for many instances, defaults are hardcoded into the datasheet, shouldn't they be omitted as you said? For example, POWER_SUPPLY.units is specified as 2;2;2 for all rows.
Yes, some mandatory data are hard-coded. You can leave them blank when you are not sure, and I will complete them on your PR. Don't hesitate to ask on a case-by-case basis if you want to understand our assumptions.
Regarding power supply, we assume that there are 2 units. I invite you to keep this assumption if you have no information to the contrary.
What is important to understand is that the components described in the file correspondent to the all machine/platform (which is equivalent to the metal version of the EC2 type).
column | Comments | constant | instances.vantage.sh data export | other source | |
---|---|---|---|---|---|
id | yes | "API Name" | |||
manufacturer | yes | AWS | |||
CASE.case_type | yes | rack | |||
year | yes but not used in the calculation, so I wouldn't make this data mandatory | https://instancetyp.es/ | |||
vcpu | yes | "vCPUs" | |||
platforme_vcpu | plarform == metal : metal.vCPU | ? | |||
CPU.units | Correspond to the number of CPU for the platform. You can retrieve the number of CPU of the platform from the number of vCPU for a given CPU name and the number of CPU of the platform : platform_vcpu / nb_vcpu(cpu_name) | ? | |||
CPU.core_units | Numbers of cores per CPU (usually 1 core == 2vCPUs), if not provided will be completed from crowdsourcing/cpu_specs | ? | |||
CPU.name | Yes, should match a CPU in crowdsourcing/cpu_specs | "Physical Processor", extrapolated | |||
CPU.manufacturer | Yes, if not provided will be completed from crowdsourcing/cpu_specs | "Physical Processor", extrapolated | |||
CPU.model_range | Yes, if not provided will be completed from crowdsourcing/cpu_specs | "Physical Processor", extrapolated | |||
CPU.family | Yes, if not provided will be completed from crowdsourcing/cpu_specs (in cpu_specs family == code_name) | aws webpage instance type description | |||
CPU.tdp | Yes, if not provided will be completed from crowdsourcing/cpu_specs | lookup crowdsourcing/cpu_specs | |||
CPU.manufacture_date | yes but not used in the calculation, so I wouldn't make this data mandatory | same as year? or cpu_specs.release_date? | |||
instance.ram_capacity | yes | "Instance Memory" | |||
RAM.capacity | See ### RAM | ? "Instance Memory" !=/= RAM.capacity * RAM.units / usage.instance_per_server ? | existing data doesn't seem valid if | ||
RAM.units | See ### RAM | ? | |||
SSD.units | Yes only if the instance host a SSD (no EBS) | "Instance Storage", extrapolated | aws webpage instance type table | ||
SSD.capacity | Yes only if the instance host a SSD (no EBS) | "Instance Storage", extrapolated | aws webpage instance type table | ||
HDD.units | I think no instance has an HDD | ? | |||
HDD.units | I think no instance has an HDD | ? | |||
GPU.name | Yes, GPU are not taken into account for now but will be soon so feel free to collect this data | "GPU model" | |||
GPU.units | Yes, GPU are not taken into account for now but will be soon so feel free to collect this data | "GPUs" | |||
GPU.TDP | Will be completed from name in future versions | ? | |||
GPU.memory_capacity | Memory for 1 GPU | BAD or "GPU memory" / "GPUs" or "GPU memory" | |||
POWER_SUPPLY.units | 1 + 1 backup | 2;2;2 why? | |||
POWER_SUPPLY.unit_weight | Since we don't know this data we have a range between 1 and 5 with an average of 2.99 | 2.99;1;5 why? | |||
USAGE.instance_per_server | Yes. I would be interested to see the errors | metal.vCPU / "vCPUs", does apparently not go for all types | |||
USAGE.time_workload | Yes. If not specified by users when the instance is requested we will compute it for 0%, 50% and 100M | 50;0;100 ? why?? is it default;min;max? | |||
USAGE.use_time_ratio | Yes, it means that its up 100% of the time. | 1 | |||
USAGE.hours_life_time | It is a theoretical life duration. Used to allocate the embedded impacts. | 35040 (4 years, but many has manufacture_date < 2019?) | |||
USAGE.other_consumption_ratio | We only model the consumption of CPU and RAM. This ratio it used to account for the consumption of other components. | 0.33;0.2;0.6 (PUE?) source? what is x;y;z ? | |||
USAGE.overcommited | Boolean. Not used for now. If overcommitted a vCPU might be shared. Its impacts should also be shared. | 0 or 1, why which? | |||
Warnings | |||||
configuration.disk.units | Legacy. It has been removed | ? | |||
configuration.disk.type | Legacy. It has been removed | ? | |||
configuration.disk.capacity | Legacy. It has been removed | ? |
For the ram we need to know both the numer of strip and the capacity of each strip for the platform. What we have usually is the total quantity of RAM for the EC2 metal. In this case, we're looking for a logical distribution of ram strips (the maximal capacity possible between 8 GB, 16GB, 32GB, 128GB). Example :
r6g.metal : 512.0 GB ==> 2*128 GB
It's very unscientific, but it's the best we can do. Hence the warning "RAM.capacity not verified".
Feel free to join our public chat if you wan't to discuss or to launch a synchronous call : https://chat.boavizta.org/signup_user_complete/?id=97a1cpe35by49jdc66ej7ktrjc
Missing instance have been added in PR https://github.com/Boavizta/boaviztapi/pull/237
Bug description
aws.csv is missing around 300 instance types. See this gist.
I would like to help adding the missing instance types, but I can't seem to figure out where the data come from or if there is a script used for adding new types or updating the file. A description of how you normally discover the relevant data for the instance types would be incredibly helpful.
To Reproduce
Expected behavior
That all instance types was in the file.
JSON OUTPUT
Additional context