Closed pansapiens closed 6 years ago
not sure I understand this one, user should be able to specify any amount of memory through -memory
flag on command line?
For BDS to correctly set the memory for an sbatch/qsub job, it needs to be specified for the BDS task, not the whole pipeline. Each task is it's own job on the queue, so it makes sense to explicitly set memory for any of the tasks that will require significant memory. This is independent of any command line memory options STAR, picard/JVM etc might have (ideally the BDS task mem setting should be slightly higher than any memory setting the tool itself uses).
@pansapiens sorry been busy, took a while to get back to this issue. Is this not what you need https://github.com/MonashBioinformaticsPlatform/RNAsik-pipe/blob/2c19a5dfd8c17238a75f036a821e5a16835bcbfc/src/sikSTARaligner.bds#L89 ?
Because STAR memory gets set exactly here..
Or have I fixed this because of this comment and forgot to close this issue?
Ah, I see - just setting -memory
would solve the immediate issue with STAR index generation, but this exposes a set of related issues (especially with regard to running on an HPC job queue).
This -memory
setting is used for both STAR and BWA which have fairly different RAM requirements. I guess with a name like -memory
I'd expected this to be a setting that somehow applies to the whole run. Since it really only applied to STAR and BWA, I think -memory
should be renamed -alignerMemory
, or even better split into -starMemory
and -bwaMemory
(so you can't accidentally switch aligners but forget to change the -alignerMemory
setting).
For proper utilisation on a cluster every task should have it's mem :=
option set explicitly (with sensible defaults), otherwise any task that consumes more RAM than whatever the default job limit is on that cluster will be killed (eg SLURMs default value for the --mem
setting might be only 4Gb). Also, small tasks may needlessly wait in the queue for more RAM than they really need when there is a node with a few cores and small amount of RAM actually available. The bds.config
mem
setting can be used as a fallback default value for tasks without mem :=
specified, but it's better not to rely on this.
Just want to add couple of general comments here. @pansapiens and I have spoke off and online about memory allocation issues. I'll prioritise this now and attempt to implement mem
setting for every task, or at least for every "major" task.
Now it turns out that if you attempt to run RNAsik
on machine with small resources, which wont be enough to start STAR aligner, then BDS behaviour is some what unexpected, or at least this is how I'm interpreting this. If there isn't enough compute resource for a particular task, RNAsik
(bds) will skip that task and go to the next available task that meets the resources requirement and dependencies requirement.
This is "grey" area of RNAsik
where not every task has dependencies set up, mainly because it was either hard to do OR where were no real dependencies.
This shouldn't effect those who runs RNAsik
with right system requirements, that is at lest 30 Gb of RAM and 4 cpus (this is based on human/mouse, for species with smaller genomes, less RAM will be required)
This is important for running on HPC queues (eg SLURM). Setting it to the default value used for alignment might be sufficient.
A quick look suggested index generation on GRCm38 consumed ~ 26 Gb RAM, but may have peaked higher.
ie, for 64 Gb RAM (over allocating, but safe), specify:
task(!fastaRef.isEmpty(), genomeIdxFiles <- fastaRef, cpus := threads, mem := 68719476736, taskName := "Making STAR index")
This probably needs to be generalized in
sik.config
to allow memory settings for each task that might consume more than ~4Gb of RAM (index generation, alignment, mark duplicates), with sensible defaults.