Adjust the default setting to speed up read alignment

y9c commented 5 years ago

The _BASIC_SC_RNA_COUNTER.ALIGN_READS (STAR alignment) step in cellranger count pipeline is ultra slow. The default setting of cellranger split the file int 96 chunks and alignment run one by one in a sequential order. Meanwhile each STAR alignment run use only 4 CPU core. Seems this is the bottleneck of the whole pipeline.

Some of other tools in cellranger can detect the core number in host machine and utilize the CPU usage.

Increase the threads number in cellranger-cs/3.0.2/mro/sc_rna_counter_cs.mro will doube the performance, but there is still other constraints in the settings.

namisaghaei commented 5 years ago

@yech1990 what are some of the constraints you are talking about? I'm also trying to cut down processing time by using as many cores/threads as possible in the parallelizable stages. I think generating fastqs and alignment are both completely parallelizable, but am struggling to optimize cellranger's use of cores on my machine.

y9c commented 5 years ago

This line hard codes the thread.

https://github.com/10XGenomics/cellranger/blob/5f5a6293bbc067e1965e50f0277286914b96c908/lib/python/cellranger/utils.py#L502 https://github.com/10XGenomics/cellranger/blob/5f5a6293bbc067e1965e50f0277286914b96c908/lib/python/cellranger/io.py#L32

I think change 4 to os.cpu_count() or multiprocessing.cpu_count() can speed up.

10XGenomics / cellranger

Adjust the default setting to speed up read alignment #25