MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

tiny-deseq.r: bugfix: "syntactically invalid" control condition names are mishandled #266

Closed AlexTate closed 1 year ago

AlexTate commented 1 year ago

R has strict requirements for "syntactically valid" character sets in a variety of contexts, including column names for tables. We carefully handle these names in a manner that is compatible with R while still preserving the original names in outputs.

After a recent refactor to tiny-deseq.r, which was intended to make the codebase more maintainable when the format of feature_counts.csv changes, a bug was introduced in the handling of the --control command line value when it contains forbidden characters. These conditions cause tiny-deseq.r to exit with an error before producing DGE tables. This bug went undetected because, by chance, control groups in our test datasets have been "syntactically valid" names.

Edit 12/23: reopening this issue to add a proactive check in configuration.py's SamplesSheet class to detect namespace collisions that result from the "syntactically valid" translation