OpenMS / streamlit-template

5 stars 6 forks source link

Framework for TOPP tool workflows #41

Closed axelwalter closed 2 months ago

axelwalter commented 3 months ago

Framework for TOPP tool workflows

Features

Quickstart

This repository contains a module in src/workflow that provides a framework for building and running analysis workflows.

The WorkflowManager class provides the core workflow logic. It uses the Logger, Files, DirectoryManager, ParameterManager, and CommandExecutor classes to setup a complete workflow logic.

To build your own workflow edit the file src/TOPPWorkflow.py. Use any streamlit components such as tabs (as shown in example), columns, or even expanders to organize the helper functions for displaying file upload and parameter widgets.

Simply set a name for the workflow and overwrite the upload, parameter, execution and results methods in your Workflow class.

The file pages/6_TOPP-Workflow.py displays the workflow content and can, but does not have to be modified.

The Workflow class contains four important members, which you can use to build your own workflow:

self.params: dictionary of parameters stored in a JSON file in the workflow directory. Parameter handling is done automatically. Default values are defined in input widgets and non-default values are stored in the JSON file.

self.ui: object of type StreamlitUI contains helper functions for building the parameter and file upload widgets.

self.executor: object of type CommandExecutor can be used to run any command line tool alone or in parallel and includes a convenient method for running TOPP tools.

self.logger: object of type Logger to write any output to a log file during workflow execution.

Handling input and output files in the Workflow.execution method for processes is done with the Files class, handling file types and creation of output directories.

Screenshots

Screenshot from 2024-02-12 10-04-18

Screenshot from 2024-02-12 10-04-30

Screenshot from 2024-02-12 10-04-43

Example code for a workflow

import streamlit as st
from .workflow.WorkflowManager import WorkflowManager
from .workflow.Files import Files

class Workflow(WorkflowManager):
    # Setup pages for upload, parameter, execution and results.
    # For layout use any streamlit components such as tabs (as shown in example), columns, or even expanders.
    def __init__(self):
        # Initialize the parent class with the workflow name.
        super().__init__("TOPP Workflow")

    def upload(self):
        t = st.tabs(["MS data", "Example with fallback data"])
        with t[0]:
            # Use the upload method from StreamlitUI to handle mzML file uploads.
            self.ui.upload(key="mzML-files", name="MS data", file_type="mzML")
        with t[1]:
            # Example with fallback data (not used in workflow)
            self.ui.upload(key="image", file_type="png", fallback="assets/OpenMS.png")

    def parameter(self) -> None:
        # Allow users to select mzML files for the analysis.
        self.ui.select_input_file("mzML-files", multiple=True)

        # Create tabs for different analysis steps.
        t = st.tabs(
            ["**Feature Detection**", "**Adduct Detection**", "**SIRIUS Export**"]
        )
        with t[0]:
            self.ui.input_TOPP("FeatureFinderMetabo")
        with t[1]:
            self.ui.input("run-adduct-detection", False, "Adduct Detection")
            self.ui.input_TOPP("MetaboliteAdductDecharger")
        with t[2]:
            self.ui.input_TOPP("SiriusExport")

    def execution(self) -> None:
        # Wrap mzML files into a Files object for processing.
        in_mzML = Files(self.params["mzML-files"], "mzML")

        # Log any messages.
        self.logger.log(f"Number of input mzML files: {len(in_mzML)}")

        # Prepare output files for feature detection.
        out_ffm = Files(in_mzML, "featureXML", "feature-detection")

        # Run FeatureFinderMetabo tool with input and output files.
        self.executor.run_topp(
            "FeatureFinderMetabo", input_output={"in": in_mzML, "out": out_ffm}
        )

        # Check if adduct detection should be run.
        if self.params["run-adduct-detection"]:

            # Run MetaboliteAdductDecharger for adduct detection, with disabled logs.
            # Without a new Files object for output, the input files will be overwritten in this case.
            self.executor.run_topp(
                "MetaboliteAdductDecharger", {"in": out_ffm, "out_fm": out_ffm}, write_log=False
            )

        # Combine input files for SiriusExport (can process multiple files at once).
        in_mzML.combine()
        out_ffm.combine()

        # Prepare output file for SiriusExport.
        out_se = Files(["sirius-export.ms"], "ms", "sirius-export")

        # Run SiriusExport tool with the combined files.
        self.executor.run_topp("SiriusExport", {"in": in_mzML, "in_featureinfo": out_ffm, "out": out_se})

    def results(self) -> None:
        st.warning("Not implemented yet.")
timosachsenberg commented 3 months ago

Maybe names:

axelwalter commented 3 months ago

@JeeH-K and @timosachsenberg thanks a lot for the detailed review! Will work on a better way to handle files (e.g. in a FileManager class) and keep streamlit strictly to the StreamlitUI class.

axelwalter commented 3 months ago

With the recent changes all streamlit functionality has been put into the StreamlitUI class. Helper classes are initialized only once in WorkflowManager. The Files class has been reworked to a FileManager class, an instance of it being a member of WorkflowManager and accesible for workflow construction.