Feature: tool for users to rescue v0.3.1 runs and to re-compute output counts in general

As of v0.3.0, the process of running cellbender has been separated into two main compartments:

training the model and creating/saving the posterior as posterior.h5
using the posterior to estimate the final denoised output count matrix

Step (2) can be run with a variety of different settings, and step (1) is independent of these settings. These settings include things like --fpr and many others that users probably do not commonly use, like different choices for output estimators.

This means that step (1) runs "once and for all", and people could re-generate the output denoised counts using different settings (like different --fpr values) without re-running the time-consuming step (1).

Until now, the only way to do this was to re-run cellbender with the same input arguments (minus --fpr for instance) and point cellbender to the correct checkpiont file, and hope that cellbender would cache the compute from the checkpoint file. This did work.

But it's more straightforward to have a separate utility for this purpose. (This tool will also allow users to re-compute an output count matrix from v0.3.1 runs, whose outputs were compromised but whose posterior.h5 files are perfectly fine. This allows users to run an inexpensive, CPU-only compute step to resurrect their v0.3.1 runs.)

Proposal

Recompute an output at a different FPR:

cellbender re-remove-background --input my_raw_count_matrix.h5 --posterior my_v0.3.0_posterior.h5 --fpr 0.05

Compute a valid output using a (problematic) run of v0.3.1:

cellbender re-remove-background --input my_raw_count_matrix.h5 --posterior my_v0.3.1_posterior.h5 --fpr 0.01

Do something fancy and specify a different way to compute an output:

cellbender re-remove-background --input my_raw_count_matrix.h5 --posterior my_posterior.h5 --estimator mean

broadinstitute / CellBender

Feature: tool for users to rescue v0.3.1 runs and to re-compute output counts in general #349