MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Pipeline: auto-documentation improvements #312

Closed AlexTate closed 1 year ago

AlexTate commented 1 year ago

Auto-documentation for runs is now more complete and consistent. The config files used for repeating analyses are now stored separately from those intended as documentation, so users no longer risk losing auto-documentation info if they don't make a copy before preparing a resume run.

Config Files for Repeat Analyses

All four primary configuration files are copied to the root of the Run Directory where they can be freely edited for resume runs without sacrificing auto-documentation. Previously, this was only done for the processed Run Config and Samples Sheet, while the Features Sheet and Paths Sheet remained in-place but modifiable between runs. Paths are automatically adjusted to ensure that these files represent a cohesive working configuration; config-config references are converted to relative paths that reference the adjacent copies in the target Run Directory, and all other paths are converted from relative to absolute.

Config Files for Auto-documentation

A new subdirectory, config, has been added to Run Directory outputs. It holds a copy of the four primary config files for auto-documentation only. During each resume run, a new timestamped config directory is created to hold copies of the config files that were used. If repeated analyses are performed, the outputs of the most recent analysis are now used; that is, if a replot run follows a recount run, only the most recent recount outputs are used for producing graphs.

Backward Compatibility

Performing resume runs in old Run Directories will automatically convert them. After this conversion the old-style Run Directory will behave just like new during subsequent resume runs.

  1. The four configuration files with cohesive paths are placed in the root of the Run Directory
  2. The existing processed Run Config is placed in a config directory. The other files are not included because they're likely to have been edited for other runs (due to the behavior described above). This is only a minor loss since prior tiny-count output directories contain a copy of the employed Features Sheet, and both the Paths File and Sample Sheet had been absorbed into the processed Run Config.
  3. A timestamped config directory is also created to hold the resume run's four config files.

Closes #311

taimontgomery commented 1 year ago

Tested with ram1 dataset.