MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

complexity measurement #139

Closed thalescherubino closed 9 months ago

thalescherubino commented 1 year ago

Hi Mike.

A feature that I found useful in ShortStack 3 is gone. The reported "complexity" values did help curating bona fide miRNAs and identifying false positives. Could you re implement that?

Best,

MikeAxtell commented 9 months ago

Sorry for the slow response on this. This can be easily computed using columns 7 and 8 in the Results.txt file. Column 7 ('Reads') gives the total number of reads aligned at the locus. Column 8 ('DistinctSequences') gives the the total of distinct sRNA sequences aligned at the locus. A given Distinct Sequence can have one or more associated reads. These are the two numbers that in the past were used to compute "complexity".

Column 8 in Results.txt had been previously mis-named as "UniqueReads" and the description in the README was incorrect. As of release 4.0.3 the description in the README has been fixed, and the name changed to "DistinctSequences".