MonashBioinformaticsPlatform / RNAsik-pipe

RNAsik - more than just a pipeline
https://monashbioinformaticsplatform.github.io/RNAsik-pipe/
Apache License 2.0
13 stars 5 forks source link

Memory set for samtools sort is incorrect, doesn't account for number of threads #39

Closed pansapiens closed 5 years ago

pansapiens commented 5 years ago

Essentially the same issue as: https://github.com/samtools/samtools/issues/831

The amount of memory passed to samtools sort -m is currently incorrect and results in samtools sort: couldn't allocate memory for bam_mem errors, or SLURM killing the task due to OOM.

The amount of memory used by samtools sort -m is per thread, so if we have 4 threads and 8 Gb of memory given to SLURM (sik.config: samtoolsSortMem = 8000000000), then -m needs to be 8 Gb / 4 threads = 2 Gb: samtools sort -m 20000000K --threads 4. If you add the ~85 % overhead fudge factor then it would be (8 / 4) * 0.85 = 17000000K (the fudge factor doesn't seem necessary in this case, based on my quick tests).

pansapiens commented 5 years ago

The test case that is failing due to this issue is: https://www.ebi.ac.uk/ena/data/view/PRJNA430225

pansapiens commented 5 years ago

Fix in PR: https://github.com/MonashBioinformaticsPlatform/RNAsik-pipe/pull/40

serine commented 5 years ago

@pansapiens not sure if this is correct fix ..

this is that relevant part of code

    int sortBdsMem = sortMem*sortCpu
    // need to scale down by 80 % because sam takes a bit more RAM then actually given
    real scalFact = 0.85
    // also need to covert to bytes as samtools sort needs that way
    string sortSamMem = round(scalFact * sortMem/1000)+"K"

sortMem is amount of memory per thread, but you further divide it by sortCpu i.e number of threads.

I can see how this will "fix" the issue of memory assignment, but this isn't correct fix, don't you think?

pansapiens commented 5 years ago

Ah, I see, you are correct. I'd missed the fact that sortBdsMem is already multiplied by the number of threads. Doing sortMem/sortCpu is solving it by severely under-allocating memory to samtools (or conversely heavily over-allocating the SLURM --mem).

What this is saying then, is samtools probably takes more memory than what is specified as the memory-per-thread - there must be some extra overhead there. Maybe reducing scalFact further (to 0.5) is the answer - certainly in my testing, 0.85 here is not always reliable.

serine commented 5 years ago

right, just picked at the source and found this

2107   @param  max_mem  approxiate maximum memory (very inaccurate)                                                                      

bloody hell

max_mem is what's used for sorting bam files

serine commented 5 years ago

so I'm thinking even if I don't multi-thread we still going to face that same out of mem problem...

I'll scale down more than