Closed dievsky closed 5 years ago
As customary, I'll wait some time for opinions.
I'd better stay with default strategy, after fixing #1408 In this case model name will be combined of track and control names, and it will work without any additional parameters.
As I mentioned, the proposed approach would be completely optional and backwards-compatible. If the user doesn't need a discoverable model, they skip specifying the argument and get the default-named model file. So Span
would continue to work without any additional parameters, while still allowing the user to discover the model file if needed, and allowing reuse of model files to call and tune peaks via command line.
After #1408 , the model name is even less discoverable than before, since now there are more reducing options. This poses a problem for computational pipelines which treat the model file as (intermediate) output. The only current solutions are:
.span
file in the cache.Can you please provide an example where after #1408 naming became worse?
Also there are no pipelines dealing with models directly (except models tuning in JBR).
My own scATAC-seq pipeline deals with models directly. :))
Discoverability problem is solved by pull request #3 . I'd still like to make peak calling possible without treatment files if the model is provided (currently they're still required arguments), but that can wait until a later release.
Already implemented as of https://github.com/JetBrains-Research/span/issues/9
Inspired by #1408.
Preamble: My
Span
-fittingSnakemake
pipeline needs to declare an output file, and there's no pattern that I can specify and be sure that the model name will conform to (thanks to the ID reducer). With peaks, we can specify a peak file name directly. With the model, not so much, yet the model files are exposed to the user too (we encourage viewing them in JBR).Thesis: I propose to add an optional command line argument
--model
specifying a path to a model file. Relative path is interpreted as relative to the standardfit
directory. ThenSpan
uses the provided path to save and load the model. If no--model
is provided, well, bring on the ID reducer. If the model exists, the treatment and control paths are only checked against the model info.Benefits:
Disadvantages:
Span
would refuse to load the model if the paths didn't match.