bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
85 stars 7 forks source link

Assemble multipple bulk RNA samples #37

Closed xiekunwhy closed 2 years ago

xiekunwhy commented 2 years ago


If I have many bulk RNA samples (from different tissues or different samples), what is the best way to assemble these datas:

1) merge all fastq files by cat (zcat .R1.fq.gz|gzip -c > merge_1.fq.gz; zcat .R2.fq.gz|gzip -c > merge_2.fq.gz;) and then use rna-bloom to assemble merge fastq file.

2) use rna-bloom to assemble each sample seperately and merge the assemblies.

Best, Kun

kmnip commented 2 years ago

There are 3 approaches.

  1. Assemble all samples together (as if all reads were from a single sample) In this case, you don't need to merge any FASTQ files. Make sure you specify the files in the same order for -left and -right accordingly. For example, if you have two samples, sample1 and sample2:

    java -jar RNA-Bloom.jar -left sample1_1.fq.gz sample2_1.fq.gz -right sample1_2.fq.gz sample2_2.fq.gz -revcomp-right ...
  2. Pooled assembly of your samples with the -pool and -mergepool options. Each sample is assembled using the pooled de Bruijn graph and all assemblies are merged together.

    java -jar RNA-Bloom.jar -pool READSLIST.txt -mergepool

    Please refer to the README here: PS. It is very important to note that the format of the input file for version 1.4.3 is different from those on the master branch, which is for an upcoming version:

  3. Assemble each sample separately and merge the assemblies with BBMap's dedupe:

I recommend the 2nd method if you have a large memory server and don't have too many samples.

xiekunwhy commented 2 years ago

I am trying Pooled assembly.

I have an other questions about reference guided assemble: may I use stringtie(or stringtie merge) results as reference transcript? I have no really reference transcript since I am working on a denovo genome.

kmnip commented 2 years ago

Yes, but the input needs to a FASTA file.