Allow Span CLA to call and tune peaks with just the model file present

JetBrains-Research / span

SPAN Semi-supervised Peak Analyzer

https://doi.org/10.1093/bioinformatics/btab376

MIT License

9 stars 1 forks source link

Allow Span CLA to call and tune peaks with just the model file present #9

Closed dievsky closed 5 years ago

dievsky commented 5 years ago

Expansion of issue #6 . Currently, we can pass the model file to Span CLA using --model parameter. It overrides the default path and naming convention. However, we still need to supply the treatment (and possibly control) file paths, bin size and the path to chrom.sizes; these are checked against the ones stored in the model file.

In an ideal world, having the model file is perfectly enough to call peaks. We don't need access to treatment and/or input files, we don't even need the coverage cache, we don't need anything but the posterior null hypothesis probabilities in the model file.

The proposed solution is:

make the -t and --cs arguments optional, but require either -t and --cs, or -m;
allow the various ModelFitExperiment descendants to have an empty treatment path list;
don't check the paths if they're empty, and the bin size if it wasn't provided.

dievsky commented 5 years ago

The only downside is that without any coverage information, we won't be able to provide the peak "value". Surprisingly, SpanCLA doesn't provide it even when the coverage is available, so this won't be a regression.

olegs commented 5 years ago

The latter looks like a bug in the context of SpanCLA.

dievsky commented 5 years ago

@olegs I wasn't sure whether it was a bug or some ancient design decision. I'll investigate.

dievsky commented 5 years ago

To implement this, we should make all SpanFitInformation fields nullable. The problem is that fragment is already nullable, and we can't reliably infer whether fragment was meant to be auto or not supplied, which in our intended case can lead to confusion:

<span.jar> -t treatment.bam --cs build.chrom.sizes --fragment 0 --model model.span
<span.jar> -t treatment.bam --cs build.chrom.sizes --model model.span

What should the second invocation do -- just use the model generated by the first one, because the user clearly doesn't care about the fragment size, or fail with an illegal state exception, since the model was generated for fragment size 0, and now the user obviously wants auto?

dievsky commented 5 years ago

That's why I intend to:

refactor fragment from Int? to Optional<Int>;
allow Span to accept --fragment auto parameter;
unequivocally treat missing fragment option as "I don't care";
and only then make all SpanFitInformation fields nullable.

olegs commented 5 years ago

Is it fixed already?

dievsky commented 5 years ago

Yep! Even the peak value absence was fixed in a separate PR.