Open YIGUIz opened 2 months ago
Depends on what you want to do with the transcripts?
To assemble multi-sample data, you can follow the instructions here: https://github.com/bcgsc/RNA-Bloom?tab=readme-ov-file#b-assemble-multi-sample-rna-seq-data-with-pooled-assembly-mode
If you don't really care about the tissue specificity of assembled transcripts, then you can simply pass all FASTQ files as input. You do not need to merge the files. For example:
java -jar RNA-Bloom.jar \
-left sampleA_1.fastq sampleB_1.fastq sampleC_1.fastq \
-right sampleA_2.fastq sampleB_2.fastq sampleC_2.fastq \
-revcomp-right -t THREADS -outdir OUTDIR
Thank you very much.
My data is cDNA long-read RNA-seq and stranded-specific paired-end SR RNA-seq data. and I noticed that the software doesn’t support multi-sample data assembly in LR RNAseq data. So, I considered two strategies:
1.Assemble by samples, then merge the GTF files from the same tissue.
2.Merge the long-read RNA fastq files into a single large fastq file. Additionally, I have paired short-read RNA-seq data, and I’ve merged all clean fastq files into two large fastq files (R1.fq and R2.fq). and then perform the assembly.
I'm not sure whether this approach will work.
I have another question about SR RNAseq data. The SR RNAseq is a stranded-specific paired end data. It's fr-firststrand. So my parameter is:
rnabloom -t 20 -ntcard -artifact -long ${LR_clean_fq} -sef ${SR_fq1} ${SR_fq2} -fpr 0.005 -indel 20 -p 0.75 -Q 15 -overlap 100 -length 150
I'm confused about that sef is the path to one single-end forward read file
Thank you for your assistance again!
RNA-Bloom doesn't generate GTF files.
You don't need to merge or concatenate read files for -long
, -ser
, and -sef
. You can specify multiple file paths separated by space.
If your long-read data is not direct RNA-seq or not strand specific, then you should not use the -strand
option because the strand of your short reads do not matter. So, you can specify both forward and reverse short read files for -sef
.
I don't recommend using the -artifact
option. You will end up with a lot of incorrect assemblies.
Thank you very much. Due to the sample size, I have to assemble transcripts by sample, and then merge them. So, without the gtf files, how can I generate the final transcripts (Remove redundant transcripts)? Besides, I also need to merge the transcripts from different tissue. Thank you for your help. I have just started this work, So I have a lot of question.
There could be much better ways to do this, but here is what I did in the past.
For each tissue:
minimap2 -c -x splice reference_genome.fasta rnabloom.transcripts.fa | gzip -c > rnabloom.transcripts.paf.gz
python make_gtf.py rnabloom.transcripts.paf.gz rnabloom.transcripts.gtf
Merge the GTFs from all tissues with gffcompare: https://ccb.jhu.edu/software/stringtie/gffcompare.shtml
Thank you very much😊. I'll try it.
- RNA-Bloom doesn't generate GTF files.
- You don't need to merge or concatenate read files for
-long
,-ser
, and-sef
. You can specify multiple file paths separated by space.- If your long-read data is not direct RNA-seq or not strand specific, then you should not use the
-strand
option because the strand of your short reads do not matter. So, you can specify both forward and reverse short read files for-sef
.- I don't recommend using the
-artifact
option. You will end up with a lot of incorrect assemblies.
I'm sorry to ask again, but can this program handle 400 long-read and short-read RNA-seq data simultaneously by specifying multiple file paths separated by spaces? If not, how can I obtain a complete BAM file to ensure the program works? I would get a 15T BAM file when using Samtools to merge them. Does it work?
Thank you!
I don't understand what is a "complete BAM"? RNA-Bloom is primarily a reference-free assembly tool. It does not generate any BAM files against any reference.
Regarding too many input files, you can put the paths of read files in a text file, one path on each line. You can specify the list text file with @
.
Example:
List file for short reads short_read_files.txt
:
/path/to/short_reads_01.fastq
/path/to/short_reads_02.fastq
/path/to/short_reads_03.fastq
List file for short reads long_read_files.txt
:
/path/to/long_reads_01.fastq
/path/to/long_reads_02.fastq
/path/to/long_reads_03.fastq
Example command for the list files:
java -jar RNA-Bloom.jar \
-sef @/path/to/short_read_files.txt \
-long @/path/to/long_read_files.txt \
...
Hi, I want to assemble a final set of transcripts from multiple tissues, But I have a question. Should I first assemble transcripts for each sample, then merge the transcripts from the same tissue? Or should I merge all the FASTAQ files from a tissue first, then perform the assembly?
Finally, is it correct to merge all transcripts from different tissues to get the final transcript assembly?
Thank you for your assistance!
Qi