automate model optimization

aphearin commented 5 years ago

While building the functional forms for the cosmoDC2 model, the validation criteria in DESCQA were evolving rapidly, and also some of the validation criteria require @erykoff and cannot be run by the CS group on their own, and so it was not practical to set up a framework for systematic parameter optimization. So we have been relying on largely hand-tuned parameters, which is laborious and limits the accuracy we can achieve. In particular, with the current modeling pipeline, I think it would be functionally impossible to get good fits to all the clustering and lensing data outlined in the nice validation tests proposed in this DESCQA Issue by @chihway. So in order to move the modeling forward in a meaningful way for accurate validations of clustering and lensing pipelines, some automation of the mock-generation and DESCQA validation will be needed.

Below is a sketch of a workflow for this automated optimization. Reasonably efficient machinery to carry out most/all of the mock-population side is already fully developed, though some modest effort would be required of me, @dkorytov and @evevkovacs to streamline our current pipeline, moving away from the current "staged" implementation. Even with only minor changes to the current modeling approach, I think this would be worth doing, though time/labor permitting it would be better to do this exercise in tandem with changes to the underlying model as described below and in #17 and #29.

Set up a latin hypercube of the model parameters, centered on the current best-fit model. To get started, something simple like the pyDOE package could be used.
For each point in parameter space, use the model to populate a mock into a small lightcone cutout, and also into a few snapshots.
Calculate summary statistics for the model realization, e.g., the SMF(z) in the lightcone, HSC dn/dmag, DEEP2 dn/dz, M*-dependent clustering and gg-lensing (or just DeltaSigma(Rp)) at a few snapshots, etc. When available, DESCQA implementations of these summary statistic calculations should be used (cc @yymao - if any DESCQA refactoring is required by this effort, this would be the pinch point, but I think the current API already more or less satisfies this).
Loop over the points in the hypercube and write all model parameters and data vectors to disk.
This dataset will enable lots of post-processing analyses related to optimizing model parameters under different loss functions, building emulators, etc.
After selecting a particular good-fitting model, then a full-tilt Universe can be generated on NERSC and run it through the full DESCQA machinery.

The current cosmoDC2 production pipeline has several "stages" that would need to be consolidated to do this based on the current modeling approach. This effort also seems to go naturally with the modeling developments outlined in #17, since implementing #17 means the sampling of UniverseMachine into Outer Rim can be eliminated from the pipeline, and instead Outer Rim is populated using parameterized fitting functions for M* and SFR-percentile that are simultaneously calibrated. This would also make sense to do with the updated Galacticus library described in #29 that @abensonca is working on, since the cross-matched population will be denser and more homogeneously distributed in color-color-magnitude-redshift space, which is important for achieving good-fitting, smooth color PDFs that are free of discontinuities and discreteness effects.

Comments welcome from anyone, either on this workflow or a different idea for a longer term optimization effort based on DESCQA validation. CC also @katrinheitmann @rmandelb @danielsf

yymao commented 5 years ago

@aphearin thanks for this nice summary of current state + your vision on the path forward. I think this is a good plan, and will certainly make it easier for people in the analysis WGs to contribute more easily.

Re: your question about whether refactoring DESCQA will be required for this automated process --- I think the answer is most likely no (because we've done that already :slightly_smiling_face:), or at least not in a significant way. In fact, each DESCQA test can be imported and run in other python script, if that's what's needed. But in any case, I'll be happy to help improving the interface between DESCQA and your pipeline.

One point I want to mention is that, we probably don't need to optimize the model for all DESCQA tests. While some tests are certainly checking for "failure on this means we cannot do science", other tests I think are just checking for features that "would be nice to have, but not big deal if not", and even some are in the "we just want to see what this would look like" category. So this model optimization should carefully distinguish these three cases so that we don't "over-optimize" it (is that a thing? anyway, I know you know what I mean :slightly_smiling_face: ).

aphearin commented 5 years ago

@yymao - yes, I know what you mean about over-optimizing and of course I agree. I was short on details about that, but the main idea is to select a few of the most sensitive and critical tests (luminosity functions, color PDFs, gg-lensing, clustering), and only optimize for those.

Re: DESCQA interface. In the latin hypercube exercise sketched here, I think we would not actually run the DESCQA tests as is, but rather peel apart the data vectors calculated from the mock by DESCQA, and record those data vectors on disk for subsequent cost-function minimization. So the only refactoring that might be required is if a test used in the optimization is not written in a way that exposes the data vector in the return value of a callable implemented in the test. So, even tests that are not factored in a way that permits that, any required refactoring seems like it would be a rather trivial.

LSSTDESC / cosmodc2

automate model optimization #31