bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

Assemble multipple bulk RNA samples #37

Closed xiekunwhy closed 2 years ago

xiekunwhy commented 2 years ago

Hi,

If I have many bulk RNA samples (from different tissues or different samples), what is the best way to assemble these datas:

1) merge all fastq files by cat (zcat .R1.fq.gz|gzip -c > merge_1.fq.gz; zcat .R2.fq.gz|gzip -c > merge_2.fq.gz;) and then use rna-bloom to assemble merge fastq file.

2) use rna-bloom to assemble each sample seperately and merge the assemblies.

Best, Kun

kmnip commented 2 years ago

There are 3 approaches.

  1. Assemble all samples together (as if all reads were from a single sample) In this case, you don't need to merge any FASTQ files. Make sure you specify the files in the same order for -left and -right accordingly. For example, if you have two samples, sample1 and sample2:

    java -jar RNA-Bloom.jar -left sample1_1.fq.gz sample2_1.fq.gz -right sample1_2.fq.gz sample2_2.fq.gz -revcomp-right ...
  2. Pooled assembly of your samples with the -pool and -mergepool options. Each sample is assembled using the pooled de Bruijn graph and all assemblies are merged together.

    java -jar RNA-Bloom.jar -pool READSLIST.txt -mergepool

    Please refer to the README here: https://github.com/bcgsc/RNA-Bloom/tree/v1.4.3#b-assemble-single-cell-rna-seq-data-with-pooled-assembly-mode PS. It is very important to note that the format of the input file for version 1.4.3 is different from those on the master branch, which is for an upcoming version: https://github.com/bcgsc/RNA-Bloom#b-assemble-multi-sample-rna-seq-data-with-pooled-assembly-mode

  3. Assemble each sample separately and merge the assemblies with BBMap's dedupe: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/dedupe-guide/

I recommend the 2nd method if you have a large memory server and don't have too many samples.

xiekunwhy commented 2 years ago

I am trying Pooled assembly.

I have an other questions about reference guided assemble: may I use stringtie(or stringtie merge) results as reference transcript? I have no really reference transcript since I am working on a denovo genome.

kmnip commented 2 years ago

Yes, but the input needs to a FASTA file.