Open jdrnevich opened 3 years ago
Hi Jenny,
unfortunately, it's not possible to control the amount of RAM used at alignment step - it's determined by the genome index size. A ~3Gbase genome should fit into 32GB, but I would request a bit more, say 35GB.
--limitBAMsortRAM 10000000000
is only needed if you were to use --genomeLoad
shared memory options, which is not recommended for cluster jobs.
Without a shared memory genome, the genome will be unloaded before sorting, and the memory used for RAM will be equal to the genome index size.
Also, I would recommend adding the GTF file at the genome generation step, not at the mapping step - this would save time and memory at the mapping step.
Cheers Alex
Thanks, Alex. So we should still use --limitGenomeGenerateRAM
in the alignment to indicate how much memory STAR has access to, or it gets automatically set to whatever the genome index had when it was created?
The reason we do not add the GTF file at the genome generation step is that the gene model annotation changes a lot more frequently than the genome. We are a core that analyzes lots of different species for many different researchers, and we have fewer reference indexes to maintain by not including the GTF. Though I admit we haven't done any benchmarking to see how much extra time it adds during the alignment vs. not having to remake a new index every quarter when a new Ensembl/Gencode gene set comes out!
Hi Jenny,
once the genome is created, STAR automatically allocates as much memory as it needs, so you do not need --limitGenomeGenerateRAM
at the mapping stage.
I agree that in your case it makes perfect sense to add annotations on the fly, to avoid keeping multiple different references. It only adds a few minutes to each run, so for long enough runs it should not be a big overhead. However, it might be significant for short runs that you were talking about (5M reads?).
Cheers Alex
I am using a computer cluster where I have to request a specific amount of memory from the SLURM scheduler. To generate an index, I request 60 GB and then run the following code:
(60000000000 B = 55.8794 GB so I use that shorthand to make sure I am under my requested memory)
Now I want to do the alignment of fastqs that only have 5 M reads each so I do not need very much memory. If I only request 10GB of memory for the scheduler, what argument/s do I need to give the the alignment script so that it will not use more than 10GB? I always thought it was also
--limitGenomeGenerateRAM
but in discussion with someone else and reading the manual, it sounds like that only applies to generating the index. Looking at the 15.11 Limits section of the manual, maybe I only need to add--limitBAMsortRAM
with a specific value so it will not default to the "genome index size" like so:Would this be the correct way to limit the alignment memory? Thanks!