Closed leoisl closed 4 years ago
Oh man, that's annoying. I think only supporting the latest would be silly. We should be flexible. I will look into whether there is a way to specify the units that is backwards compatible.
Damn, it looks like the ability to specify memory units was only added in 10.1.0.2 (See "Improvements to units for resource requirements and limits").
The only way of handling this that I can think of now is getting the LSF_UNIT_FOR_LIMITS
variable in lsf.conf
and using this to convert MB into whatever unit it has set. If LSF_UNITS_FOR_LIMITS
is not present then we assume KB (which is the LSF default).
Luckily this file is supposed to live in a standardised location - ${LSF_ENVDIR}/lsf.conf
On my cluster
$ grep LSF_UNIT_FOR_LIMITS ${LSF_ENVDIR}/lsf.conf
LSF_UNIT_FOR_LIMITS=MB
This is annoying as it will add a tiny bit of IO to each job, but these files on our cluster seem to have ~150 lines so it shouldn't be too bad.
The best solution would be to add some auxiliary functions to the MemoryUnits
enum that can handle this conversion for us.
In terms of the fastest way to get the value of LSF_UNIT_FOR_LIMITS
we should benchmark a pythonic way and also a system call to grep
. Although, is it safe to assume grep
is present on all machines? The version of grep
won't matter as we would be purely using grep PATTERN FILE
.
Do you see a better solution or any problems with this suggestion @leoisl ?
This solution is fine for me! I am just wondering if there is a way of getting the value of LSF_UNIT_FOR_LIMITS
only once, when we are configuring the profile through cookiecutter, but instead of being configured by the user, it would be configured automatically for the cluster the profile is being setup. I am assuming this value rarely changes.
Actually, that is a really good point. I guess we can just ask the user for this value during setup. We can provide code for how to get it both in the README and potentially in the cookiecutter prompts too.
Today I implemented a library to deal with memory units as this is becoming something I am having to do regularly. So tomorrow I will use this to fix this issue.
I use LSF in two different clusters. In one, this is the LSF version:
Explicitly specifying memory units in this LSF version works just fine, e.g. this command works fine:
In the other cluster, the version is a bit older:
Explicitly specifying memory units in this cluster fails:
It works if
MB
is removed:I did not track which update of LSF between these two versions enabled memory units to be specified in
-M
.I wonder if: