Raw counts for downstream analysis

a-slide / NanoCount

EM based transcript abundance from nanopore reads mapped to a transcriptome with minimap2

https://a-slide.github.io/NanoCount/

MIT License

53 stars 5 forks source link

Raw counts for downstream analysis #24

Closed lubitelpospat closed 1 year ago

lubitelpospat commented 1 year ago

Hello, I am trying to figure out how can we use your tool for differential expression analysis. In NAR paper, you're using DESeq2 (requires integer counts), however, the tool only produces floats for estimated counts (and TPMs). Can you please clarify how did you prepare counts for the downstream DE analysis? The scripts in this https://github.com/josiegleeson/directRNA repo do not show these manipulations.

josiegleeson commented 1 year ago

Hi, No problem. I simply used the round() command in R on the data frame of counts after importing to convert these to integers. Hope this helps! Josie.

Stakaitis commented 2 months ago

Hi @josiegleeson, Does that mean that when you use round() command in R, you discard some of the data? For example, if 'est_count' column has a values of <0.5, then the round() command will round it to 0.

In my case, I have 6232 rows in the NanoCount output, but 3475 of them have >0.5 value in the 'est_count' column. Which means that after the round() command almost half of the data will be zeros. Is it also the same with your data?