VCCRI / Sierra

Discover differential transcript usage from polyA-captured single cell RNA-seq data
GNU General Public License v3.0
49 stars 17 forks source link

Which alignment method and indexing options are suitable to use with Sierra? #50

Closed hkarakurt8742 closed 2 years ago

hkarakurt8742 commented 2 years ago

Hello everyone, Thank you developing Sierra. I want to use it on some scRNA-Seq data but as I guess, the alignment step is crucial in this case. I believe STAR works fine in this case. Is it okay to use other algorithms such as hisat2?

Also, the indexing of genome using STAR requires lots of parameters including splice junction database parameters. Do they needed to use in indexing for best performance in using Sierra?

davhum commented 2 years ago

Hi @hkarakurt8742,

We have only really tested Sierra on pipelines that utilise the STAR aligner (eg cellranger or STARsolo). In this case it is best to incorporate splice junctions via gtf when building the genome index.

I don't believe Hisat2 can be used for 10x single cell data - so you may want to double check this. Note Hisat2 can be used for full length single cell - but this isn't the kind of data Sierra was designed to work on.

hkarakurt8742 commented 2 years ago

Hi @hkarakurt8742,

We have only really tested Sierra on pipelines that utilise the STAR aligner (eg cellranger or STARsolo). In this case it is best to incorporate splice junctions via gtf when building the genome index.

I don't believe Hisat2 can be used for 10x single cell data - so you may want to double check this. Note Hisat2 can be used for full length single cell - but this isn't the kind of data Sierra was designed to work on.

Thank you for your answer. I will keep working with STAR. I just have a question about your answer. You mentioned that Sierra was not designed to work on full length transcript single cell RNA-Seq data so, is it not suitable to use a Smart-Seq2 based scRNA-Seq data with Sierra?

davhum commented 2 years ago

Correct: Sierra is not suitable for smart-seq2 data. Reason being is that smart-seq2 provides coverage across the whole transcript and if there is enough coverage you should be able to determine transcript isoforms either visually or perhaps using tools designed for bulk RNA-Seq (although not sure how well these will perform). Reason Sierra doesn't work on smart-seq2 data is because it will not reliably find piles of reads (i.e. peaks) at consistent places within transcripts.