cougarlj / COMPSRA

COMPSRA: a COMprehensive Platform for Small RNA-Seq data Analysis
https://regepi.bwh.harvard.edu/circurna/
GNU General Public License v3.0
16 stars 6 forks source link

How to extract length of sequences? #52

Open kenminsoo opened 1 year ago

kenminsoo commented 1 year ago

I am not looking to do differential expression so I did not open the DE module.

However, I would still like to normalize the count data with transcript length.

Is there any way to obtain this from the pipeline or do I have to look for the original sequences from the respective databases in order to obtain the length of the annotated transcripts?

Thank you!

cougarlj commented 1 year ago

Dear kenminsoo,

Do you mean you want to know the length of miRNAs? If so, you could visit miRBase and get this information. If you want to know the length of reads, you have to check the fastq files after QC or even bam files. Usually, the length of miRNA is about 21nt and you can directly use the count of miRNAs in your downstream analysis.

Best wishes, Jiang Li

On Sun, Apr 2, 2023 at 1:47 PM kenminsoo @.***> wrote:

I am not looking to do differential expression so I did not open the DE module.

However, I would still like to normalize the count data with transcript length.

Is there any way to obtain this from the pipeline or do I have to look for the original sequences from the respective databases in order to obtain the length of the annotated transcripts?

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/cougarlj/COMPSRA/issues/52, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGMADY3ROJBV45TCQSQ6GLW7EHIPANCNFSM6AAAAAAWQCZ2IM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kenminsoo commented 1 year ago

Thank you for the quick response! I am more so curious about the non-miRNAs such as circRNAs, snoRNAs, snRNAs, etc that are counted in the pipeline. I would like to normalize their counts by their reads length.

I was thinking that I needed to normalize these since their sequence length is greater than what is captured by the library prep protocol.