As of v0.3.0, the process of running cellbender has been separated into two main compartments:
training the model and creating/saving the posterior as posterior.h5
using the posterior to estimate the final denoised output count matrix
Step (2) can be run with a variety of different settings, and step (1) is independent of these settings. These settings include things like --fpr and many others that users probably do not commonly use, like different choices for output estimators.
This means that step (1) runs "once and for all", and people could re-generate the output denoised counts using different settings (like different --fpr values) without re-running the time-consuming step (1).
Until now, the only way to do this was to re-run cellbender with the same input arguments (minus --fpr for instance) and point cellbender to the correct checkpiont file, and hope that cellbender would cache the compute from the checkpoint file. This did work.
But it's more straightforward to have a separate utility for this purpose. (This tool will also allow users to re-compute an output count matrix from v0.3.1 runs, whose outputs were compromised but whose posterior.h5 files are perfectly fine. This allows users to run an inexpensive, CPU-only compute step to resurrect their v0.3.1 runs.)
As of v0.3.0, the process of running cellbender has been separated into two main compartments:
posterior.h5
Step (2) can be run with a variety of different settings, and step (1) is independent of these settings. These settings include things like
--fpr
and many others that users probably do not commonly use, like different choices for output estimators.This means that step (1) runs "once and for all", and people could re-generate the output denoised counts using different settings (like different
--fpr
values) without re-running the time-consuming step (1).Until now, the only way to do this was to re-run cellbender with the same input arguments (minus
--fpr
for instance) and point cellbender to the correct checkpiont file, and hope that cellbender would cache the compute from the checkpoint file. This did work.But it's more straightforward to have a separate utility for this purpose. (This tool will also allow users to re-compute an output count matrix from v0.3.1 runs, whose outputs were compromised but whose
posterior.h5
files are perfectly fine. This allows users to run an inexpensive, CPU-only compute step to resurrect their v0.3.1 runs.)Proposal
Recompute an output at a different FPR:
Compute a valid output using a (problematic) run of v0.3.1:
Do something fancy and specify a different way to compute an output: