MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 30 forks source link

complexity measurement #139

Closed thalescherubino closed 11 months ago

thalescherubino commented 1 year ago

Hi Mike.

A feature that I found useful in ShortStack 3 is gone. The reported "complexity" values did help curating bona fide miRNAs and identifying false positives. Could you re implement that?

Best,

MikeAxtell commented 11 months ago

Sorry for the slow response on this. This can be easily computed using columns 7 and 8 in the Results.txt file. Column 7 ('Reads') gives the total number of reads aligned at the locus. Column 8 ('DistinctSequences') gives the the total of distinct sRNA sequences aligned at the locus. A given Distinct Sequence can have one or more associated reads. These are the two numbers that in the past were used to compute "complexity".

Column 8 in Results.txt had been previously mis-named as "UniqueReads" and the description in the README was incorrect. As of release 4.0.3 the description in the README has been fixed, and the name changed to "DistinctSequences".