Open hxin opened 6 years ago
I tested the performance of parallel run with samples from the glucose project.
Running the map_reads for each sample at a time with 16 cores takes around 17 hours for all 48 samples mapping to mouse/rat, so that is 96 STAR runs, which makes it 10mins/run. Thus, for 4 samples, this will roughly take 40mins and 80mins for 8 samples.
Comparing to the numbers in the table, there is a big increase in turns of speed when it using multiple cores in parallel. Thus, I think this is worth discussing further. @lweasel
Samples | Core(s)/sample | Time(minutes) |
---|---|---|
4(C1 C2 C3 C4) | 1 | 54.4166 |
4(C1 C2 C3 C4) | 2 | 29.4833 |
4(C1 C2 C3 C4) | 4 | 18.5833 |
4(C1 C2 C3 C4) | 8 | 15.0000 |
4(C1 C2 C3 C4) | 16 | 15.5166 |
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) | 1 | 61.9333 |
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) | 2 | 34.9000 |
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) | 4 | 19.1833 |
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) | 8 | 18.5666 |
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) | 16 | 19.8500 |
sample size: C1 3.4G C2 3.4G D3 3.9G B1 3.3G B2 3.7G B4 3.4G D4 3.9G D1 4.0G C3 3.4G C4 3.4G B3 3.4G D2 4.3G
also, something interesting is that the time needed does not change much from 8cores to 16cores, so in turns of STAR runs, in the current implementation, using 8 cores and 16 core are not making much difference.
Further tests have been done to test the performance of the implementation.
The parallelized code was added to sort_reads/map_reads/filter_reads and I used the same 6 samples/2 species to run Sargasso with two different configurations:
Samples | Core(s)/sample | Max-cores | run1(minutes) | run2(minutes) |
---|---|---|---|---|
C1 C2 C5 C6 C9 C10 | 16 | 16 | 251 | 286 |
C1 C2 C5 C6 C9 C10 | 8 | 16 | 204 | 264 |
It seems that the increase is not as good as the individual test on the sort_reads/map_reads/filter_reads where, in general, the paral-version uses halftime with a 8/16 setup.
This may due to the fact that the performance may be limited by IO, rather than processing speed, when running these jobs together.
Overall, there is an increase by parallelizing the sort_reads/map_reads/filter_reads process.
row_number | job | core | total_core | run | time | time_pre_run | sample_pre_run | total_sample |
---|---|---|---|---|---|---|---|---|
1 | filtered | 10 | 16 | 7.5 | 108.2833 | 13.535412 | 1.600000 | 12 |
2 | filtered | 12 | 16 | 9.0 | 57.1666 | 6.351844 | 1.333333 | 12 |
3 | filtered | 12 | 32 | 4.5 | 56.4833 | 11.296660 | 2.666667 | 12 |
4 | filtered | 16 | 16 | 12.0 | 110.0833 | 9.173608 | 1.000000 | 12 |
5 | filtered | 16 | 32 | 6.0 | 55.6333 | 9.272217 | 2.000000 | 12 |
6 | filtered | 2 | 16 | 1.5 | 75.4500 | 37.725000 | 8.000000 | 12 |
7 | filtered | 4 | 16 | 3.0 | 70.2000 | 23.400000 | 4.000000 | 12 |
8 | filtered | 4 | 32 | 1.5 | 65.5333 | 32.766650 | 8.000000 | 12 |
9 | filtered | 6 | 16 | 4.5 | 74.4166 | 14.883320 | 2.666667 | 12 |
10 | filtered | 8 | 16 | 6.0 | 70.2500 | 11.708333 | 2.000000 | 12 |
11 | filtered | 8 | 32 | 3.0 | 71.5333 | 23.844433 | 4.000000 | 12 |
12 | mapped | 12 | 16 | 9.0 | 75.2833 | 8.364811 | 1.333333 | 12 |
13 | mapped | 12 | 32 | 4.5 | 72.7000 | 14.540000 | 2.666667 | 12 |
14 | mapped | 16 | 16 | 12.0 | 148.5666 | 12.380550 | 1.000000 | 12 |
15 | mapped | 16 | 32 | 6.0 | 77.3833 | 12.897217 | 2.000000 | 12 |
16 | mapped | 2 | 16 | 1.5 | 158.8500 | 79.425000 | 8.000000 | 12 |
17 | mapped | 4 | 16 | 3.0 | 97.6166 | 32.538867 | 4.000000 | 12 |
18 | mapped | 4 | 32 | 1.5 | 110.5833 | 55.291650 | 8.000000 | 12 |
19 | mapped | 8 | 16 | 6.0 | 80.5500 | 13.425000 | 2.000000 | 12 |
20 | mapped | 8 | 32 | 3.0 | 77.0166 | 25.672200 | 4.000000 | 12 |
21 | sorted | 12 | 16 | 9.0 | 34.6000 | 3.844444 | 1.333333 | 12 |
22 | sorted | 12 | 32 | 4.5 | 33.5333 | 6.706660 | 2.666667 | 12 |
23 | sorted | 16 | 16 | 12.0 | 49.7666 | 4.147217 | 1.000000 | 12 |
24 | sorted | 16 | 32 | 6.0 | 27.4166 | 4.569433 | 2.000000 | 12 |
25 | sorted | 2 | 16 | 1.5 | 76.8000 | 38.400000 | 8.000000 | 12 |
26 | sorted | 4 | 16 | 3.0 | 45.8666 | 15.288867 | 4.000000 | 12 |
27 | sorted | 4 | 32 | 1.5 | 48.1666 | 24.083300 | 8.000000 | 12 |
28 | sorted | 8 | 16 | 6.0 | 41.9833 | 6.997217 | 2.000000 | 12 |
29 | sorted | 8 | 32 | 3.0 | 39.2000 | 13.066667 | 4.000000 | 12 |
a="filtered/10/16/run/7.5
filtered/10/16/time/108.2833
filtered/12/16/run/9.0
filtered/12/16/time/57.1666
filtered/12/32/run/4.5
filtered/12/32/time/56.4833
filtered/16/16/run/12.0
filtered/16/16/time/110.0833
filtered/16/32/run/6.0
filtered/16/32/time/55.6333
filtered/2/16/run/1.5
filtered/2/16/time/75.4500
filtered/4/16/run/3.0
filtered/4/16/time/70.2000
filtered/4/32/run/1.5
filtered/4/32/time/65.5333
filtered/6/16/run/4.5
filtered/6/16/time/74.4166
filtered/8/16/run/6.0
filtered/8/16/time/70.2500
filtered/8/32/run/3.0
filtered/8/32/time/71.5333
mapped/12/16/run/9.0
mapped/12/16/time/75.2833
mapped/12/32/run/4.5
mapped/12/32/time/72.7000
mapped/16/16/run/12.0
mapped/16/16/time/148.5666
mapped/16/32/run/6.0
mapped/16/32/time/77.3833
mapped/2/16/run/1.5
mapped/2/16/time/158.8500
mapped/4/16/run/3.0
mapped/4/16/time/97.6166
mapped/4/32/run/1.5
mapped/4/32/time/110.5833
mapped/8/16/run/6.0
mapped/8/16/time/80.5500
mapped/8/32/run/3.0
mapped/8/32/time/77.0166
sorted/12/16/run/9.0
sorted/12/16/time/34.6000
sorted/12/32/run/4.5
sorted/12/32/time/33.5333
sorted/16/16/run/12.0
sorted/16/16/time/49.7666
sorted/16/32/run/6.0
sorted/16/32/time/27.4166
sorted/2/16/run/1.5
sorted/2/16/time/76.8000
sorted/4/16/run/3.0
sorted/4/16/time/45.8666
sorted/4/32/run/1.5
sorted/4/32/time/48.1666
sorted/8/16/run/6.0
sorted/8/16/time/41.9833
sorted/8/32/run/3.0
sorted/8/32/time/39.2000"
require(tidyr)
require(dplyr)
require(ggplot2)
read.table(text=a,col.names=c('raw')) %>%
tidyr::separate(raw,into=c('job','core','total_core','type','value'),sep="/") %>%
dplyr::group_by(job,core,total_core) %>%
reshape2::dcast(job + core + total_core ~ type) %>%
dplyr::mutate_at(vars(-job),funs(as.numeric)) %>%
dplyr::mutate(time_pre_run=time/ceiling(run),
sample_pre_run=total_core/core,
total_sample=c(12)) %>%
ggplot() + geom_point(mapping=aes(x=core, y=time, size=time_pre_run,color=total_core)) + facet_wrap(~ job)
Currently the function run STAR one sample at a time using ${NUM_CORE}. Test if running more samples in parallel and use less core for each sample.