a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2
https://a-slide.github.io/NanoCount/
MIT License
53 stars 5 forks source link

Can I directly map reads directly to genes instead of transcripts to estimate the expression in gene level? #37

Closed kerenzhou062 closed 1 month ago

kerenzhou062 commented 2 months ago

Hi Josie,

I'm wondering that whether I can obtain the reads counts for gene by directly mapping reads to sequence from the whole gene locus instead of transcripts (minimap2 -ax splice)? Here is the example for the fasta file:

>ENSG00000240409.1 ATGCCCCAACTAAATACTACCGTATGACCCACCATAATTACCCCCATACTCCTTACACTATTCCTCATCACCCAACTAAAAATATTAAATACAAATTACCACCTACCTCCCTCACCAAAGCCCATAAAAATAAAAAACTATAACAAACCCTGAGAACCAAAATGAACGAAAATCTGTTCACTTCATTCATTGCCCCCACAATCCTAG >ENSG00000248527.1 ATGAACGAAAATCTGTTCACTTCATTCATTGCCCCCACAATCCTAGGCCTACCCGCCGCAGTACTGATCATTCTATTTCCCCCTCTATTGATCCCCACCTCCAAATATCTCATCAACAACCGACTAATTACCACCCAACAATGACTAATCCAACTAACCTCAAAACAAATGATAGCCATACACAACACTAAGGGACGAACCTGATCTCTTATACTAGTATCCTTAATCATTTTTATTGCCACAACTAACCTCCTCGGACTCCTGCCTCACTCATTTACACCAACCACCCAACTATCTATAAACCTAGCCATGGCCATCCCCTTATGAGCGGGCGCAGTGATTATAGGCTTTCGCTCTAAGATTAAAAATGCCCTAGCCCACTTCTTACCACAAGGCACACCTACACCCCTTATCCCTATACTAGTTATTATCGAAACCATCAGCCTACTCATTCAACCAATAGCCCTGGCCGTACGCCTAACCGCTAACATTACTGCAGGCCACCTACTCATGCACCTAATTGGAAGCGCCACACTAGCAATATCAACTATTAACCTTCCCTCTACACTTATCATCTTCACAATTCTAATTCTACTGACTATCCTAGAAATCGCTGTCGCCTTAATCCAAGCCTACGTTTTTACACTTCTAGTAAGCCTCTACCTGCACGACAACACATAA >ENSG00000198744.5 ATGACCCACCAATCACATGCCTATCATATAGTAAAACCCAGCCCATGGCCCCTAACAGGGGCCCTCTCAGCCCTCCTAATGACCTCCGGCCTAGCCATGTGATTTCACTTCCACTCCACAACCCTCCTCATACTAGGCCTACTAACCAACACACTAACCATATACCAATGATGGCGCGATGTAACACGAGAAAGCACATACCAAGGCCACCACACACCACCTGTCCAGAAAGGCCTTCGATACGGGATAATCCTATTTATTACCTCAGAAGTTTTTTTCTTCGCAGGATTTTTCTGAGCCTTTTACCACTCCAGCCTAGCTCCCACCCCCCAACTAGGGGGACACTGGCCCCCAACAGGCATCACCCCGCTAAATCCCCTAGAAGTCCCACTCCTAAACACATCCGTATTACTCGCATCAGGGGTATCAATCACCTGAGCTCACCATAGTCTAATAGAAAACAACCGAAACCAAATAATTCAAGCACTGCTTATTACAATTTTACTGGGTC

Thanks! Keren

josiegleeson commented 1 month ago

Hi Keren, I haven't personally done this before, usually I'd align to the genome with the minimap splice command and use something like featureCounts to count genes. Otherwise, I'd recommend using tximport in R to summarise your transcript expression data into gene expression data, by basically summing the total expression for each transcript within a gene.

Hope that helps!