a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2
https://a-slide.github.io/NanoCount/
MIT License
53 stars 5 forks source link

about data of nanocount paper #16

Closed rezarahman12 closed 1 year ago

rezarahman12 commented 1 year ago

Hi Josie This is not an issue of nanocount but rather a discussion. However, I'll appreciate your expert opinion on the use of nanocount data. I am thinking to do a tutorial project for learning about the dRNA-Seq analysis using your nanocount paper.

The steps are outlined below-

  1. download fast5 and fastq datasets of nanocount paper from ENA database.
  2. align them to Gencode transcriptome (not genome) reference data using minimap2.
  3. apply nanocount to estimate transcript abundance.
  4. DE analysis using DESeq or edgeR.
  5. identify novel isoform/alternative splicing using nanocount.

I need clarification on step2- Do I need to map reads to sequin transcriptome if I am not interested in benchmarking tools? Do I need to map the reads to the reference genome if I am interested to find DE genes?

I'll highly appreciate your kind guidance and opinion in this regards.

I'm sorry that I am taking your precious time.

Kind regards Reza

josiegleeson commented 1 year ago

Hi Reza,

Apologies for the late reply!

The tutorial looks good. For your questions, no don't bother mapping to the sequins, the sequin reads just won't map anywhere so you should be fine. And you can avoid mapping to the genome by summarising your transcript level counts to the genome using the R package tximport.

As for novel isoform identification, you'll have to run something like FLAIR or bambu first.

Hope that helps, let me know if you have any more questions.