Closed jprobichaud closed 4 years ago
@jprobichaud, that's a fair request. There isn't an easy way to do this with CfnCluster today as we just have a default installation of SGE which doesn't track compute node memory. For the time being, you could look into adding the ram_free configuration to a post_install script that would kick in every time a compute node is fired up.
I'll leave this ticket open for tracking this as a feature request.
BTW, I tried to add the following commands inside the post_install.sh script, but unfortunately, the hosts aren't added as execution hosts when the post install script is executed.
ram_free=$(grep MemTotal /proc/meminfo | awk '{print $2}' | perl -nle '$a=$_; $g = int($a /1024/1024); print $g,"G";')
u -c "SGE_ROOT=/opt/sge /opt/sge/bin/lx-amd64/qconf -mattr exechost complex_values ram_free=$ram_free,exclusive=true `hostname`" sgeadmin
This generates an error that looks like:
denied: host "ip-172-31-17-49.us-west-2.compute.internal" is neither submit nor admin host
Because we have announced that we will be deprecating support for SGE in the near-future (see: https://github.com/aws/aws-parallelcluster/wiki/Deprecation-of-SGE-and-Torque-in-ParallelCluster), we will not be performing additional enhancements specific to SGE.
I am going to close this issue. If you would like to request a similar enhancement for one of our other supported schedulers (Slurm or AWS Batch), please feel free to create a new issue.
I couldn't find that information in the documentation, if this is a case of RTFM, let me know!
For many memory intensive scripts, it is useful to be able to speicify the amount of free ram that should be available on an SGE compute node to accept a job. I managed to
sudo su sgeadmin
, setup the shell variables and issueqconf -mc
to addram_free ram_free MEMORY <= YES YES 1G 0
and now I will have to issueto set the available memory there.
This isn't super easy to do at this point, especially with always changing hosts. Could we have a real solution for this?
FWIW: I got these instructions from this post in the kaldi forums