Summarization subpackage baseline

aimalz commented 3 years ago

superclass/subclass structure like Estimator()
may call Estimator() subclasses
"dummy" method would be stacking results of a given Estimator() subclass
also include one easy SOM-based direct-from-colors method [make a separate issue]
maybe include qit-based CHIPPR implementation if scalable

aimalz commented 3 years ago

@johannct made a good point about Summarizer subclasses that want posteriors output by Estimator subclasses instead of just input photometry from Creator subclasses. Rather than letting those summarizers have a method that calls an estimator, it should instead have keywords in its config file that specify what estimator it wants to run (and where to find its config file) and/or the path to the estimated posteriors it wants to read in. This would kind of be analogous to the estimator subclasses that take completely different inputs from one another. The advantage is that no estimators are hidden in summarizers.

aimalz commented 3 years ago

@AngusWright This would be a good opportunity to wrap your SOM_DIR code as a Summarizer() subclass, which would have the added bonus of being the first example of RAIL calling underlying R code.

aimalz commented 3 years ago

Maybe what makes sense is for a Summarizer() superclass to live in rail.estimation so something like The WiZZ or a SOM can inherit from both Summarizer() and Estimator().

joezuntz commented 3 years ago

I've been thinking about this a bit and two conclusions I've come to:

the variety in what summarizers take as inputs is very large; this means the super classes will end up very thin, I think only for configuration stuff. Possibly you could add some high level methods like __init__, train, run or something
I'd suggest summarizers that need a PDF input explicitly take that input, rather than trying to run an estimator internally.
we should think of post-burner processes (like some variants of CC) as an additional that mutates an existing n(z), rather than a standard Summarizer step.

My main suggestion is that we get the code in place first before trying to push things under a single superclass.

aimalz commented 2 years ago

@joezuntz Just caught this while catching up on a GitHub notification backlog, sorry! I think we discussed this offline, but in case I'm mistaken, or just for completeness, I'll also reply to your above comment here.

The plan is now to restructure the rail.estimation subpackage to have a three-tier superclass/subclass structure in which there is an Estimator() superclass that contains the I/O interface and most general methods (i.e. the __init__, train, estimate, etc. steps you cited) with a pair of subclasses for estimating per-galaxy photo-z information (named something like PZEstimator()) and population-level redshift information (named something like NZEstimator()) that themselves have subclasses for what we currently think of as estimators and summarizers; the wrappers of methods that produce both per-galaxy and ensemble-level redshift information would then be able to inherit from both of the superclasses. This plan addresses your three points above:

We'd minimize the redundant of overhead for the shared I/O of flexible data types, which has the added boon of further reducing the number of files to modify if the I/O API changes in the future.
It would be no more difficult to have summarizers accept PDFs as input than under the original design, and it enables wrapping methods to derive either/both population-level and per-galaxy redshift information in exactly the same way as described above (of possible interest to @biprateep @brettandrews @janewman-pitt-edu for per-galaxy photo-z posterior estimation methods like that of Bordoloi+10).
And, again, it would be no more difficult for any wrapped Estimator() sub-subclasses to accept estimates of an ensemble redshift distribution.

But it totally makes sense to draft up some individual methods before solidifying a top-level structure! It seems to me like proto-wrapping the guts of TXPipe's DIR pipeline stage LSSTDESC/TXPipe#193 would be low-hanging fruit (along with the others listed under "Development of n(z) estimators" in the overall plan for the nz-band-sensitivity project LSSTDESC/nz-band-sensitivity#1), but perhaps I'm behind on the status of the standalone SOM summarizer.

eacharles commented 2 years ago

overtaken by events

LSSTDESC / rail_attic

Summarization subpackage baseline #56