acankaya2017 / COT6900

FAU Summer 2024 Independent Research
0 stars 0 forks source link

Parallelize R code #2

Open shankar4 opened 3 weeks ago

shankar4 commented 3 weeks ago

Adam, Parallelize the code wherever there are bottlenecks and where it is feasible. Document the execution time before and after. Your goal for this DIS is to integrate the flow, improve the efficiency, and incorporate AI concepts. The semantic web concepts are already incorporated in the current flow with GO and KEGG.

Ravi Shankar

acankaya2017 commented 3 weeks ago

I have added some code to display the time each section is taking. Not including any download time, the entire notebook takes me about 2 minutes (120 sec) to fully run. Chunk 2 takes about 2/3 of this time to run (80 sec). It includes creating the DGEList object, filtering, normalization, and fitting the model. In Chunk 2 most of the time is being taken up by the calcNormFactors() and voom normalization. The fitting also takes up some time. Everything else is trivial. So I will need to focus on improving the normalization and fitting code. But I am not sure it will be possible to use multi threading or multi core parallelization for these tasks because we are using libraries and just doing a single line of code. To really do any kind of multi-processing we would need to modify the library code itself.

Chunk 1 finished in 4.6s Chunk 2 finished in 80.4s Chunk 3 finished in 21.2s Chunk 4 finished in 14.9s Chunk 5 finished in 7s Chunk 6 finished in 0.8s Total run time: 128.9s