MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Pipeline: tagged counting repurposed as classifier #241

Closed AlexTate closed 1 year ago

AlexTate commented 1 year ago

The Tag column has been renamed to Classify as... and will be used to apply a user-defined class to features that match the rule. The Class= attribute is no longer used to determine a feature's class. Tagged counting semantics still apply.

The counts table produced by tiny-count therefore now has a multiindex of (Feature ID, Classifier). Backward compatibility is not offered for counts tables produced by an earlier version of tinyRNA. The Features Sheet is checked for the presence of a Tag column at pipeline/tiny-count startup and, if present, an error is produced along with steps to fix it.

These changes opened the door for some very satisfying improvements to the code quality in plotter.py. Two additional parameters have been added to the pipeline/tiny-plot:

Closes #240

AlexTate commented 1 year ago

Since this PR introduces changes that are backward incompatible, I would like to make a release for the project in its current state before this one is merged.

taimontgomery commented 1 year ago

With this new, much improved approach to classification, won't the class and rule plots always be the same? And thus can we get rid of the rules plots? Perhaps also change counts_by_rule.csv to counts_by_classification.csv, changing the Rule String column to Classification?

AlexTate commented 1 year ago

No, class and rule plots will differ if any rules share a Classify as... value. Rule plots can be used in this case to see how much each rule contributed to the pooled classes. For this reason I think the proposed changes to output files would be incorrect

taimontgomery commented 1 year ago

I see. In that case, perhaps we can add a counts_by_classification.csv table at some point.

taimontgomery commented 1 year ago

Tested successfully with ram1 data.