UCSF-Costello-Lab / LG3_Pipeline

The original LG3 pipeline
https://github.com/UCSF-Costello-Lab/LG3_Pipeline
0 stars 0 forks source link

Remove `bigmem` resource request from PBS script #87

Closed HenrikBengtsson closed 6 years ago

HenrikBengtsson commented 6 years ago

Several *.pbs scripts have PBS declarations that request bigmem nodes, e.g.

#PBS -l nodes=1:bigmem:ppn=12,vmem=200gb

Do we really need that? I would think specifying mem/vmem would take care of that. I don't know the history of bigmem but it could be a legacy from old times.

Having bigmem in there could prevent a job from ending up on a non-bigmem nodes if one would submit the PBS script with say qsub -l nodes=1:ppn=6,vmem=50gb ...

ivan108 commented 6 years ago

Good point, I think we can remove "bigmem"... Do you know how many bigmem and "smallmem" nodes are available e.g. for Costello lab?

HenrikBengtsson commented 6 years ago

Ok, let's plan on removing that.

  1. I don't there's such a thing as a smallmem resource/node; a node is flagged as either bigmem or not.
  2. You can run pbsnodes | grep -E "(^n[0-9]+|big|small)" on the cluster to see which nodes have the bigmem flag. See https://ucsf-ti.github.io/tipcc-web/about/specs.html for which nodes Costello Lab has access to.
ivan108 commented 6 years ago

I see the picture now... So we have 10 nodes with 32-64GB of RAM, which could be used for small memory jobs. Good to know! I will work on removing bigmem, I guess a sed one liner should do it...

ivan108 commented 6 years ago

Removed!

HenrikBengtsson commented 6 years ago

Related to this, @shuntsman-ucsf, did you manage to drop bigmem using qalter on TIPCC? If not, we can do a quick release of the current develop version where bigmem is no longer used.

ivan108 commented 6 years ago

what is qalter ?? I did remove it from *.pbs files...

HenrikBengtsson commented 6 years ago

what is qalter ??

https://www.jlab.org/hpc/PBS/qalter.html - Scott gots lots of jobs already queued up and by using qalter he can modify the existing resource specs of those jobs.

I did remove it from *.pbs files...

Yup, thxs for that - that's in the develop branch. Scott is always running the latest stable release (to avoid him using a broken/poorly tested develop commit among other things).

shuntsman-ucsf commented 6 years ago

Related to this, @shuntsman-ucsf, did you manage to drop bigmem using qalter on TIPCC? If not, we can do a quick release of the current develop version where bigmem is no longer used.

I did not see any mention of "bigmem" in any of the text output by 'qstat -f'. And when I released several modified jobs from hold, some went to a new node for the first time, so hopefully not an issue anymore. I only used the following commands:

qalter -l nodes=1:ppn=10 jobid
qalter -l vmem=20G jobid
HenrikBengtsson commented 6 years ago

Sounds like those qalter calls of yours caused any bigmem:s to be dropped. Good. Then we don't need to rush out a new release.