MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Pipeline: new location for configuring GFF files and aliases #245

Closed AlexTate closed 1 year ago

AlexTate commented 1 year ago

Config file changes: The Alias by... and Feature Source columns have been removed from the Features Sheet. This is a healthy change because these columns were exclusively coupled to each other, and none of the other columns, per rule. This understandably led to some confusion.

GFF file inputs are now defined in the Paths File, where all other non-sample file inputs reside. Its YAML data type is a list of mappings, where each list item holds the path to the file and an optional list of alias attributes for the file. When the Paths File is parsed, only unique GFF files are retained, and if there are duplicate entries for the same path but different aliases, the aliases are merged with duplicates removed and order preserved.

Command line argument changes: The command line arguments for tiny-count have been updated accordingly. Rather than adding the Paths File to the two existing inputs (Samples Sheet and Features Sheet), users need only pass the Paths File which contains the locations of all required file inputs.

Codebase improvements: A new class, PathsFile, has been added to configuration.py to act as an API to tiny-count and the Configuration class. It validates the config file at construction and automatically resolved relative paths upon lookup. This is true in both "pipeline" mode and standalone mode.

Misc. changes and bugfixes:

Closes #234

AlexTate commented 1 year ago

Merge conflicts have been resolved

taimontgomery commented 1 year ago

Tested successfully on ram1 data.