MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

tiny-config: SamplesSheet: validate group names to avoid namespace collisions in R #268

Closed AlexTate closed 1 year ago

AlexTate commented 1 year ago

R has strict character requirements for "syntactically valid" names in a variety of contexts, including column names. Sample group names therefore must undergo a translation to a valid form before analysis in tiny-deseq.r. This translation creates an opportunity for different group names to end up with the same "safe name", which will lead to a crash. For example, the group names a-b and a+b will both translate to a.b.

This PR adds an additional validation step to the SamplesSheet class to proactively catch these namespace collisions at pipeline startup. It provides a helpful error message that lists all collisions and groups them by shared "safe name"

Closes #266

taimontgomery commented 1 year ago

Tested successfully on ram1 dataset.