databio / pypiper

Python toolkit for building restartable pipelines
http://pypiper.databio.org
BSD 2-Clause "Simplified" License
45 stars 9 forks source link

Pypiper should utilize pipestat to report results #187

Closed donaldcampbelljr closed 1 year ago

donaldcampbelljr commented 1 year ago

Related to: https://github.com/pepkit/pipestat/issues/21

Currently, Pypiper uses function _safe_write_to_file to write items to a local file. However, Pypiper should use pipestat.report instead.

As a specific example stats are written to a stats.tsv file in a table format:

key1    abc    sample_pipeline
key2    def    shared

Using pipestat.report from within a Pypiper pipeline manager can be used to write stats:

default_pipeline_name:
  project: {}
  sample:
    sample_pipeline:
      Result: key1    abc    sample_pipeline
donaldcampbelljr commented 1 year ago

Per discussion, the pipeline manager should use pipestat.report for the following files:

      self.pipeline_profile_file = pipeline_filepath(self, suffix="_profile.tsv")
      self.pipeline_stats_file = pipeline_filepath(self, filename="stats.tsv")
      self.pipeline_figures_file = pipeline_filepath(self, filename="figures.tsv")
      self.pipeline_objects_file = pipeline_filepath(self, filename="objects.tsv")

What was originally reported as a single .tsv files will be reported into a single .yaml file.

Annotations will be also be deprecated.