Closed pansapiens closed 5 years ago
The test case that is failing due to this issue is: https://www.ebi.ac.uk/ena/data/view/PRJNA430225
@pansapiens not sure if this is correct fix ..
this is that relevant part of code
int sortBdsMem = sortMem*sortCpu
// need to scale down by 80 % because sam takes a bit more RAM then actually given
real scalFact = 0.85
// also need to covert to bytes as samtools sort needs that way
string sortSamMem = round(scalFact * sortMem/1000)+"K"
sortMem
is amount of memory per thread, but you further divide it by sortCpu
i.e number of threads.
I can see how this will "fix" the issue of memory assignment, but this isn't correct fix, don't you think?
Ah, I see, you are correct. I'd missed the fact that sortBdsMem
is already multiplied by the number of threads. Doing sortMem/sortCpu
is solving it by severely under-allocating memory to samtools (or conversely heavily over-allocating the SLURM --mem
).
What this is saying then, is samtools
probably takes more memory than what is specified as the memory-per-thread - there must be some extra overhead there. Maybe reducing scalFact
further (to 0.5) is the answer - certainly in my testing, 0.85 here is not always reliable.
right, just picked at the source and found this
2107 @param max_mem approxiate maximum memory (very inaccurate)
bloody hell
max_mem
is what's used for sorting bam files
so I'm thinking even if I don't multi-thread we still going to face that same out of mem problem...
I'll scale down more than
Essentially the same issue as: https://github.com/samtools/samtools/issues/831
The amount of memory passed to
samtools sort -m
is currently incorrect and results insamtools sort: couldn't allocate memory for bam_mem
errors, or SLURM killing the task due to OOM.The amount of memory used by
samtools sort -m
is per thread, so if we have 4 threads and 8 Gb of memory given to SLURM (sik.config:samtoolsSortMem = 8000000000
), then-m
needs to be 8 Gb / 4 threads = 2 Gb:samtools sort -m 20000000K --threads 4
. If you add the ~85 % overhead fudge factor then it would be (8 / 4) * 0.85 = 17000000K (the fudge factor doesn't seem necessary in this case, based on my quick tests).