New configuration to specify qp.Ensemble parameterization in `CatEstimator` stages

drewoldag commented 1 year ago

Currently almost all subclasses of rail_base.estimator.CatEstimator will store resulting qp.Ensembles using a qp.interp gridded representation. We should add a new configuration parameter to allow users to select which qp representation is preferred. i.e. qp.hist, qp.spline, qp.packed_interp, etc...)

The work here is similar to issue #11 in that the work in this repository (rail_base) is relatively small, but the work to respect the new configuration parameter in all of the subclasses of CatEstimator will be substantial.

Also note that there will likely need to be updates made to several jupyter notebooks as well. But currently we do not have an exhaustive list of which notebooks will be affected.

eacharles commented 5 months ago

So, a lot of the estimators have native representations of ensembles. How would you propose to handle this in those cases?

aimalz commented 5 months ago

In those cases, the default value of the configuration parameter for that stage would just be the (known, for that stage) native parameterization, no?

eacharles commented 4 months ago

A couple thought. 1) I think we should only do this in a way that only touches the base class code, not any of the sub-classes as that would be rather disruptive. This is going to be kinda tricky because we don't just write the ensemble at the end, but rather we allocate the memory at the beginning of the run() and then fill in it from the parallel processes. I.e., we will have to modify the _run() and _do_chunk_output() methods to do this.

2) I think a better solution than requiring parameters for the output representation would be to use parameters that default to None but that allow you to force the qp representation to a particular type.

The function qp.factory.convert(in_dist, class_name, **kwds)
used as new_ensemble = qp.factory.convert(orig_ensemble, self.config.qp_output_classname, **self.config.qp_output_class_pars) or

qp.Ensemble.convert_to(self, to_class, **kwargs) used as new_ensemble = orig_ensemble.convert_to(qp.factory.stats[self.config.qp_output_classname, **self.config.qp_output_class_pars)

Would allow you to convert from one representation to another.

So, this could be something like:

if self.config.qp_output_classname is not None:
new_ensemble = orig_ensemble.convert_to(qp.factory.stats[self.config.qp_output_classname, **self.config.qp_output_class_pars)

LSSTDESC / rail_base

New configuration to specify qp.Ensemble parameterization in `CatEstimator` stages #28