fukunagatsu / RIblast

RIblast is ultrafast RNA-RNA interaction prediction software based on seed-and-extension algorithm for comprehensive lncRNA interactome analysis
MIT License
16 stars 10 forks source link

RI Blast parallelization enquiry. #17

Open ryashpal opened 3 years ago

ryashpal commented 3 years ago

Dear Sir,

My name is Yashpal Ramakrishnaiah, doing bioinformatic research in BioinformaticsLab led by Dr. Sonika Tyagi, Monash University, Melbourne, Australia. We are working on a web based tool called linc2function to annotate long noncoding RNAs (lncRNAs) in real-time, the pipeline can be accessed from this link: https://tsonika-lab.erc.monash.edu/linc2function. We will be publishing a paper regarding the same soon.

In linc2function pipeline we have integrated RIBlast for obtaining RNA-RNA interactions and we found it to be very useful in this regard. We noticed a small issue while scaling it for bigger datasets. I am writing this mail to find out if there is any way to parallelise the prediction part to be run on multiple threads, so that we can display the RNA-RNA interactions over a bigger dataset of RNAs in real-time. Please let me know if you need more details regarding this, I will be happy to share it.

I would really appreciate it if you can guide us regarding the parallelization, I Look Forward to Hearing From You.

Thanks, Yashpal Ramakrishnaiah

amatria commented 2 years ago

Hi :)

I am unsure if this is still relevant, but I am working on a parallel implementation of the RIblast algorithm (pRIblast). I have already accelerated the interaction search step and have submitted a paper on it (although it is still under review).

I am planning on merging the parallel database construction step soon. You can check out the code for this purpose here.

The new tool was parallelized using both MPI (distributed memory) and OpenMP (shared memory), allowing it to scale from standard desktop environments to multicore computing clusters and supercomputing facilities.

Some benchmarking results obtained on a 256 CPU core cluster are summarized in the following table: Dataset RIblast (db + ris) pRIblast (db + ris)
Lepidothrix [1] 480.0s + 7602.2s 6.0s + 217.1s
Ursus [2] 836.9s + 16714.6s 5.0s + 119.0s
Drosophila [3] 2496.6s + 53029.0s 30.4s + 595.2s

Moreover, the new algorithm has been thoroughly optimized and has reduced the memory requirements of the RNA interaction search step, meaning that pRIblast can execute huge datasets that may run out of memory using RIblast only.

By the way, I am mentioning @sabagh1994 too, since he asked kind of a similar question back in 2018. But still, I am unsure if this is even relevant for you two now.

Best regards, Iñaki Amatria Barral

=== [1] http://ftp.ensembl.org/pub/release-97/fasta/lepidothrix_coronata/ncrna/ [2] http://ftp.ensembl.org/pub/release-97/fasta/ursus_americanus/ncrna/ [3] http://ftp.ensembl.org/pub/release-97/fasta/drosophila_melanogaster/ncrna/

sabagh1994 commented 2 years ago

Hi Amarita,

Thanks for mentioning me for pRIBlast. I am done with the project now but it is really good to know about your implementation. Btw, you should have used "she" not "he" :). @amatria

Best,

Saba

fukunagatsu commented 2 years ago

Dear Amarita,

I'm appreciate for your interest in my software and for improving it. I look forward to seeing your pRIblast paper published!

Best,

Tsukasa