RobertsLab / resources

https://robertslab.github.io/resources/
17 stars 10 forks source link

Selecting RNAseq Experimental Parameters #1848

Closed valeste closed 3 months ago

valeste commented 3 months ago

I am currently trying to get an estimate on genohub of the cost it would be to process 12 samples for RNAseq including sequencing and library prep.

I have selected RNA (polyA-selected) project type as I am aiming to only evaluate changes in gene expression for protein coding genes.

A quick skim through the literature for other transcriptomic projects for my study species describe the following instruments, read lengths, and reads per sample:

12 million 2x150 bp reads (Illumina Nextseq 500)

llumina HiSeq 2500 platform (2 × 100 bp read length), which yielded a total of ~277 million paired-end reads.

However, looking through this list of quotes on genohub suggest that Illumina Novaseq may be a more affordable option. Is the instrument used very important to the selection? Should I aim to select a service selection similar to those described in the aforementioned papers I linked?

Thanks!

kubu4 commented 3 months ago

Instrument is not important.

The primary aspect that you need to consider is obtaining the number of reads to get your desired sequencing "depth" per transcript. Basically, you need to perform a rough calculation to determine how many reads would be necessary to sequence each transcript N number of times.

This is so that you can have confidence in each base call, at each position of your transcript. This way, you end up with a "canonical" sequence for each transcript, without the need to be concerned that a sequencing error was responsible for any given base.

If there are published transcriptomes for your species of interest, then you can use the number of transcript sequences in those to help guide your calculations. Otherwise, you can use a genome size as your guide.

Offhand, I can't remember what sequencing depth we usually target for RNA-seq, but I feel like there are common recommendations floating around the internet.

AHuffmyer commented 3 months ago

A common recommendation is 15-30M reads but this varies by species, purpose of analysis, etc. For example, the standard read depth for RNAseq from Azenta/Genewiz is 15-20M.

Here are some resources on read depth and experimental design: Genewiz FAQ Liu et al. 2014 Lamarre et al. 2018 Schurch et al. 2016 Conesa et al. 2016