Request for clarification/specificity

@sschmidt23 and I have some questions/thoughts on the setup/scope/goals of the challenge that we're hoping to see addressed in the README (or something equally visible).

Though it's sort of included by demonstration in the example, we'd like to see a more general explanation of what constitutes an entry reflected in this repo as well. As a starting point for discussion, one could define a "tomographic binning scheme" as an algorithm that accepts some prior information (provided by the challenge organizers, currently phrased as a "training set" though I'd request that it be generalized so as to invite non-machine learning entries) and a galaxy catalog of a given format (the example adequately shows that this information could include a variable number of fluxes) and outputs bin assignments for all galaxies in the catalog. In this context, an "algorithm" would include anything from simple cuts on a photo-z point estimate to a trained machine learning model or an optimization procedure that interfaces with the cosmological inference module. So, would an entry then be a python script (and any associated files/modules it may require) with the exact same main() function as is in the example, or something like that? Even if it's not the same as the above proposal, it would be helpful to have some explanatory text for this information.
In the example, the metric is evaluated per bin. Will all entries need to share the same number of bins or is that something we're allowed/encouraged to optimize over? If it must be shared, will it be specified ahead of time for the whole challenge or assumed to be an input parameter passed to the relevant step in the main() function? (It goes without saying, but I'd be in favor of both requiring that entries accept as input a variable number of bins and encouraging optimization of bin number for an overall metric.)
Related to the per-bin scores, obviously the per-bin scores can't be directly compared across competing algorithms because the bin memberships are independent, and the bins won't necessarily even be well-ordered. How will the per-bin scores by the power spectrum S/N metric be combined into a scalar score to identify a winner? (In PLAsTiCC, this superficially inconsequential choice turned out to be the deciding factor for the whole challenge!)
The previous two questions might be rendered obsolete by the Fisher matrix-based metric mentioned in the README once it's fleshed out, but is the plan to calculate some kind of figure of merit in the space of cosmological parameters? (If so, I'd instead suggest something based on the Kullback-Leibler divergence in that space, for which I can provide code if desired.)
In planning for the TXSelector project, we discussed a series of metrics evaluated at intermediate steps between the bin definitions/assignments and constraints on the cosmological parameters, the simplest of which quantified the overlap between n(z) in bins; if the power spectrum S/N is metric 1 and the Fisher matrix idea mentioned in the README is metric 2, the one we were thinking of as a starting point for the TXSelector project could have been metric 0. I think it's useful to build up the metrics in this way because it tells us something about how well metrics on intermediate steps probe the thing we really care about, the posterior over the cosmological parameters, which could be considered an interesting result in and of itself. Is there a plan to do this? (For what it's worth, I think the cross-entropy might be appropriate for a baseline n(z) overlap metric.)

Thanks in advance!

Hi both. Many thanks for this.

We're looking at what submissions should look like right now in thechal_refactor branch. The plan is that a submission method should be a class with train and apply methods that will be run first on the training information and then on a final sample.

But broadly, yes, any function that can map to a bin selection for each object (or for a subset of the objects) is what we have in mind.

We definitely don't mean to restrict to ML-type methods - anything that might work is great! In practical terms are there other inputs or priors information that you're thinking of in particular that alternatives might need? Or is it a purely linguistic difference?

You're definitely encouraged to optimize the number of bins, and define the edges in any way you like! We don't want n_bin to be an explicit metric, since then you could output a large number of very strongly overlapping bins which would not be useful in practice. But adding moderately well-spaced bins should improve both the SNR and FOM metrics.
The SNR metric is for all the bins collectively, and includes the covariance between them.
Yep - again see the chal_refactor branch. We're currently looking at a Fisher-based FOM, but if you can suggest an alternative that would be v welcome.
Very much agree - having more metrics will certainly give us more nuance about where gains are coming from. What does cross-entropy look like for > 2 bins?

LSSTDESC / tomo_challenge

Request for clarification/specificity #2