Closed gwolski closed 1 week ago
Your code looks correct. I'm testing it right now.
ClusterConfig:
Scheduling:
SlurmSettings:
ScaledownIdletime: 20
I made the change in my configuration. I downloaded the config file that get generated and confirmed that the setting shows up in the ParallelCluster config file. I updated my cluster and it successfully updated the config and ParallelCluster. I checked in slurm_parallelcluster.conf and confirmed that SuspendTime is set to 1200 seconds which is 20 minutes. So I think that it is working. I was initially a little confused because there is no ScaledownIdletime parameter in slurm.conf. The slurm parameter is SuspendTime.
PLBKAC. Solved.
Here is the error I was getting, it happens right after the AMI builds section is output (I've copied and pasted a bit of that here to give you context):
"Rocky": {
"8": {
"arm64": {},
"x86_64": {}
},
"9": {
"arm64": {},
"x86_64": {}
}
}
}
Traceback (most recent call last):
File "/proj/work/gwolski/aws-eda-slurm-cluster-3.11.1/source/app.py", line 31, in
TypeError: list indices must be integers or slices, not str
Subprocess exited with error 1
I tried so many variants, I must have copied and pasted the wrong code in my issue here. Here is the offensive code that caused the above error that I should have realized is wrong.
```
ClusterConfig:
Scheduling:
- SlurmSettings:
ScaledownIdletime: 20
```
Note the '-' in front of the SlurmSettings. Argh. Damn (tired) user. Never file a ticket when you are tired. Thank you.
I have now used the appropriate code, as you have shown, and I see the correct entry in the YAML file when downloaded with PCUI and also the value SuspendTime=1200 in the slurm_parallelcluster.conf file. All good.
I'm trying to increase the timeout of the ScaledownIdletime.
I added the following ClusterConfig/SlurmSettings/ScaledownIdletime to my cluster config file:
I've discovered this is the wrong syntax. Your documentation only states make ClusterConfig a dict. I look at the config_schema.py and not much there to go on either. I've tried multiple variations, including:
just can't figure it out. This latter example at least throws an error by the python code.
I have been able to get the simple case of tags to work:
Can you please add some examples (specifically my need) to your documentation and this issue?
How do I add a section in the config file to change the ScaledownIdletime?