Closed haoliu1213 closed 4 years ago
Thank you so much. Actually, this bug has been fixed in version v2.2-beta.0, by removing the {vf} option when running on a lsf system to avoid unexpected errors. Users can use the {cpu} option to control the number of total subtasks running on a computer node. Thank you again.
i see, but there is no way to request memory reservations if the {vf} is removed, it's easy to run out of memory in the sort stage, because other big memory job may run on the same node.
Yes, because I do not have a lsf system and cannot debug it. So you means lfs sytem use -R rusage[mem={vf}] -M {vf} to control cpu and memory for a job? may include some type errors?
-n {cpu} control cpu, -R rusage[mem={vf}] meams this job will use {vf} memory, so the system will allocate node with >{vf} to this job, -M {vf} meams the system will kill the job if the job's memory usage is more than {vf}. after fixing it, i can run the whole pipeline successfully without the -dbuf option, otherwise some nodes will stuck because of 'run of memory' in the correction stage.
ok, thanks.
Hi, could you help me to write some codes about getting the value of LSF_UNIT_FOR_LIMITS automatically, because I cannot find a lsf system.
ok, i will have a try.
kit.py
##add func##
def lsf_mem(mem):
import re, os
LSF_UNIT_FOR_LIMITS = "" #lsf default unit, which is defined by LSF system,
LSF_CONF_BASENAME = "lsf.conf"
LSF_CONF_FILEPATH = os.getenv('LSF_ENVDIR') + "/" + LSF_CONF_BASENAME
with open(LSF_CONF_FILEPATH, 'r') as f:
LSF_UNIT_FOR_LIMITS = re.search('LSF_UNIT_FOR_LIMITS=(\S+)',f.read(), re.M).group(1)
if not LSF_UNIT_FOR_LIMITS:
LSF_UNIT_FOR_LIMITS="MB"
if re.search('K',LSF_UNIT_FOR_LIMITS,re.I):
if re.search('K',mem,re.I):
mem = re.search('\d+',mem).group()
mem = int(mem)
if re.search('M',mem,re.I):
mem = re.search('\d+',mem).group()
mem = int(mem) * 1024
elif re.search('G',mem,re.I):
mem = re.search('\d+',mem).group()
mem = int(mem) * 1024 * 1024
elif re.search('T',mem,re.I):
mem = re.search('\d+',mem).group()
mem = int(mem) * 1024 * 1024 * 1024
if re.search('M',LSF_UNIT_FOR_LIMITS,re.I):
if re.search('K',mem,re.I):
mem = 1
if re.search('M',mem,re.I):
mem = re.search('\d+',mem).group()
mem = int(mem)
elif re.search('G',mem,re.I):
mem = int(re.search('\d+',mem).group())
mem = int(mem) * 1024
elif re.search('T',mem,re.I):
mem = re.search('\d+',mem).group()
mem = int(mem) * 1024 *1024
if re.search('G',LSF_UNIT_FOR_LIMITS,re.I):
if re.search('K',mem,re.I):
mem = 1
if re.search('M',mem,re.I):
mem = 1
elif re.search('G',mem,re.I):
mem = re.search('\d+',mem).group()
mem = int(mem)
elif re.search('T',mem,re.I):
mem = re.search('\d+',mem).group()
mem = int(mem) * 1024
return str(mem)
task_control.py
self.vf = str(vf) if vf else self.cpu + 'G'
##add code##
if self.job_type == 'lsf':
self.vf = lsf_mem(self.vf)
Ok, thank you. I will add it in the next version, but I may make some changes.
Hi, when i use -R rusage[mem={vf}] -M {vf} to reques memory reservation and control in lsf, the software is failed. After test fror times, i found the lsf-drmaa can't accept memory request string with unit e.g GB MB, it only takes numer, so i add a function to fix this in the task_control.py(Run class , init func from line 152), the NextDenovo version is 2.1-beta.0.
the code is tough, maybe you guys have a better way to fix it, hope it.