clusterinthecloud / support

If you need help with Cluster in the Cloud, this is the right place
2 stars 0 forks source link

Can't change instance type in limit.yaml #10

Closed trnh closed 3 years ago

trnh commented 3 years ago

I tried to change the instance type from t3a.small to m5.large, m5.xlarge, also tried M4, C5 types but ran into the same issue bellow

[citc@mgmt ~]$ vi limits.yaml

[citc@mgmt ~]$ finish Error: Could not find shape information for 'm5.large'. Please log a ticket at https://github.com/clusterinthecloud/terraform/issues/new Error: Could not find shape information for 'm5.xlarge'. Please log a ticket at https://github.com/clusterinthecloud/terraform/issues/new

milliams commented 3 years ago

Currently we hard-code all instance types into the file at /etc/citc/shapes.yaml. This is because amount of memory actually available for Slurm to use on a node is not the same as the total memory that an instance type has. Therefore we have to boot each instance type up and check the output of free -m.

For now, you should be able to edit /etc/citc/shapes.yaml to add in those types and rerun the finish command.

In principle the memory overhead that is lost should be relatively predictable so we can just grab the description from the API and reduce the memory limit. That hasn't been a top priority until now though as this project started on Oracle cloud who only had a few different "shapes".

trnh commented 3 years ago

if the instance has a lot of memory per vpcu for example 8GB per vcpu like C5, M, and R , do we need to check the memory.?

milliams commented 3 years ago

It's possible to set the memory that Slurm knows about to some value which is smaller than the available system memory if you don't care about using it all. However, Slurm does require some value of memory to be recorded.

milliams commented 3 years ago

I have now added the ability for CitC to query the AWS API to get this informations. That means that all instance types should now be available automatically.