YeoLab / outrigger

Create a *de novo* alternative splicing database, validate splicing events, and quantify percent spliced-in (Psi) from RNA seq data
http://yeolab.github.io/outrigger/
BSD 3-Clause "New" or "Revised" License
61 stars 22 forks source link

Parallelize novel exon finding #7

Closed olgabot closed 8 years ago

olgabot commented 8 years ago

This can be parallelized easily by returning (or generating) the novel exons, and then adding them all at once at the end

olgabot commented 8 years ago

Before parallelization:

outrigger index --sj-out-tab  --gtf   4.97s user 1.94s system 64% cpu 10.686 total

After parallelization:

outrigger index --sj-out-tab  --gtf   1.72s user 0.22s system 47% cpu 4.074 total

Whoo a ~3-4x speedup for this small dataset! Will be a LOT more for bigger ones :)

For reference, the command was:

$ time outrigger index \
    --sj-out-tab outrigger/test_data/tasic2016/unprocessed/sj_out_tab/* \
    --gtf outrigger/test_data/tasic2016/unprocessed/gtf/gencode.vM10.annotation.snap25.myl6.gtf
... lots of output ...

I probably should have made a branch but I didn't so ... here's the four commits:

https://github.com/YeoLab/outrigger/commit/bc003a0fe8d8ac53bd108e0fa2f5330e28e78155 https://github.com/YeoLab/outrigger/commit/c91bbc5e453a0701a5a58cc25632e3532947d1d8 https://github.com/YeoLab/outrigger/commit/dd505ca5cb866ef23b12297ae40c64d234d3a8d0 https://github.com/YeoLab/outrigger/commit/800d1437c97232d330f08e582029333da0f7c6b8