Closed 18parkky closed 6 months ago
Hello 18parkky,
Thanks for your inquiry.
NanoRepeat uses the GaussianMixture
function in the sklearn library to phase the reads. However, the GaussianMixture
function inherently utilizes all available cores without a direct parameter to control the number of threads. This is because GaussianMixture depends on some numpy functions which in turn rely on multi-threaded libraries like OpenMP.
The threads used by OpenMP can be controled by the environment variable OMP_NUM_THREADS
. I've made adjustments in NanoRepeat to set the OMP_NUM_THREADS
environment variable to 1. Despite this limitation in core usage, the impact on total runtime appears minimal from my observations. Please use git clone https://github.com/WGLab/NanoRepeat.git
to get the latest version and test it.
Thanks! Li
Like you said, it seems that the impact of this is minimal in my observations too. Thanks for the reply and update!
18parkky
Hi, thanks for developing NanoRepeat!
I'm trying to measure the runtime of NanoRepeat when running with varying number of cores.
However, through Linux's top command, I noticed that NanoRepeat sometimes uses more cores than specified with the -c command. For example, even though I set -c 16, NanoRepeat occasionally uses up to 26 cores. I'm assuming NanoRepeat does this during the alignment step with minimap2, where it uses as many processors as possible, and then shifts back to using less cores in other steps.
Do you know why NanoRepeat does this and any way to fix this?
Thanks, 18parkky