MontgomeryLab / tinyRNA

tinyRNA provides an all-in-one solution for precision analysis of sRNA-seq data. At the core of tinyRNA is a highly flexible counting utility, tiny-count, that allows for hierarchical assignment of reads to features based on positional information, extent of feature overlap, 5’ nucleotide, length, and strandedness.
GNU General Public License v3.0
1 stars 1 forks source link

Pipeline: log file preservation for failing steps and new backward compatibility system #276

Closed AlexTate closed 1 year ago

AlexTate commented 1 year ago

The new subdirectory logs has been added to the Run Directory. Log files for all workflow steps will be stored here rather than in their respective step subdirectories. This allows for preservation of logs that are produced by failing workflow steps, which is critical for troubleshooting.

Closes #275

taimontgomery commented 1 year ago

What about storing the log files for each step in the same directory, for example, one directory for bowtie with each library, one directory for collapser, etc.? Its a little bit busy with each library in its own directory.

AlexTate commented 1 year ago

That was my initial preference too but I think this approach actually makes the most sense from a troubleshooting perspective because the job ID is printed several times around each step including in the final error output (hard to miss). We've been looking at the terminal output from tinyRNA a lot so we know exactly what to look for when identifying which library a job is addressing, but for new users it would be a needle in the haystack that is the cwltool output.

I anticipate that most users will only look at the logs when they are troubleshooting a failed step.

AlexTate commented 1 year ago

I've updated the workflow CWL to produce log and step output subdirectories that match tinyRNA's current tool names. Users won't have to infer that collapser, counter, dge, and plotter refer to tiny-collapse, tiny-count, etc. which might help with the log directory optics. Unfortunately cwltool only uses the label field for workflow visualization options... it would be very useful for naming log directories.

New backward compatibility system

I've also introduced a basic system for backward compatibility with configuration files. In the past I made efforts to avoid compatibility issues with old configuration files, but the approach was disorganized and meant to be temporary. The new system centralizes this patchwork and provides a more robust means of documenting changes in a way that is clear and easy to maintain.

Right now only the Run Config is supported for only the changes introduced in this PR, but once more pressing tasks are completed I'll loop back to this and finish the implementation. The idea is that changes (keys added, renamed, and removed) are defined in a new YAML file (templates/compatibility/*.yaml) for each config file. Maintainers can also indicate a block for a version that can't or shouldn't be automated (e.g. semantic changes for keys instead of simple renames), which will prompt the user to manually update the file. For now these changes are performed only in memory but it would be trivial to add a command for writing upgraded files to disk.