bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

Consensus transcriptome de novo with multiple samples #70

Open nriareal opened 2 months ago

nriareal commented 2 months ago

I'm reaching out regarding an issue I encountered while using RNA-bloom version 2.0.1 in conjunction with JDK17. Specifically, I'm attempting to generate a de novo assembly using 32 samples of transcriptomics data.

Initially, I attempted to create a comprehensive assembly by merging the information from all 32 samples using the "(B) assemble multi-sample RNA-seq data with pooled assembly mode" feature. It did work. However, I observed that instead of a unified assembly, I received multiple output files, each corresponding to an individual sample. This prompted me to question whether the multi-sample assembly mode indeed facilitates the creation of a consensus transcriptome assembly.

In an effort to address this issue, I also explored the "(A) assemble bulk RNA-seq data" mode. However, I encountered a limitation wherein the tool only accepts a single LEFT and RIGHT file, thus hindering my ability to perform a multiple alignment.

Could you please clarify whether the consensus de novo transcriptome assembly needs to be performed separately? Additionally, if the bulk RNA-seq data assembly mode requires a specific file format or if there are any additional steps I should take to enable a consensus of the multiple alignments?

Your guidance and assistance in resolving these queries would be greatly appreciated.

kmnip commented 2 months ago

Initially, I attempted to create a comprehensive assembly by merging the information from all 32 samples using the "(B) assemble multi-sample RNA-seq data with pooled assembly mode" feature. It did work. However, I observed that instead of a unified assembly, I received multiple output files, each corresponding to an individual sample. This prompted me to question whether the multi-sample assembly mode indeed facilitates the creation of a consensus transcriptome assembly.

If you would like a unified assembly of the assemblies of all 32 samples, you can add the -mergepool option to your command, e.g.

java -jar RNA-Bloom.jar -pool reads_list.txt -mergepool ...

This extra step may be quite memory intensive. So, it is not turned on by default.

In an effort to address this issue, I also explored the "(A) assemble bulk RNA-seq data" mode. However, I encountered a limitation wherein the tool only accepts a single LEFT and RIGHT file, thus hindering my ability to perform a multiple alignment.

For bulk RNA-seq assembly, RNA-Bloom can accept multiple pairs of left and right FASTQ files as input, e.g.

java -jar RNA-Bloom.jar -left sample01_1.fq.gz sample02_1.fq.gz -right sample01_2.fq.gz sample02_2.fq.gz ...

In this example, sample01_1.fq.gz pairs with sample01_2.fq.gz and sample02_1.fq.gz pairs with sample02_2.fq.gz.