LSSTDESC / rail_base

Base classes for RAIL
MIT License
0 stars 1 forks source link

Refactor Evaluator/MetricEvaluator #14

Closed aimalz closed 5 months ago

aimalz commented 1 year ago

Riffing off @hdante's point in LSSTDESC/rail_base#11, the evaluation subpackage has some unexpected behavior that could make it unnecessarily computationally intensive. Currently the Evaluator in src/rail/evaluation is a meta-metric that evaluates all the specific metrics, and MetricEvaluator is the base class for individual metrics but lives in src/rail/evaluation/metrics. This is backwards of how creation and estimation are set up and doesn't convey that Evaluator is not the base class but actually a shortcut to running all the subclasses that constitute metrics. We should refactor this (and propagate changes through the unit tests and demos) to have the base class with the basic name outside the metrics directory and the metametric with an appropriately descriptive name in the metrics directory, mirroring the structures of creation and estimation. This will make it more straightforward for users to avoid costly metrics calculations unless they truly need them.

eacharles commented 1 year ago

I think you might be better off leaving it as it is and putting in code to select which metrics to calculate.   Having all the metrics in a single file will be very useful.   Have them scattered across a number of small files will be annoying.On May 26, 2023, at 3:00 PM, Alex Malz @.***> wrote: Riffing off @hdante's point in LSSTDESC/rail_base#11, the evaluation subpackage has some unexpected behavior that could make it unnecessarily computationally intensive. Currently the Evaluator in src/rail/evaluation is a meta-metric that evaluates all the specific metrics, and MetricEvaluator is the base class for individual metrics but lives in src/rail/evaluation/metrics. This is backwards of how creation and estimation are set up and doesn't convey that Evaluator is not the base class but actually a shortcut to running all the subclasses that constitute metrics. We should refactor this (and propagate changes through the unit tests and demos) to have the base class with the basic name outside the metrics directory and the metametric with an appropriately descriptive name in the metrics directory, mirroring the structures of creation and estimation. This will make it more straightforward for users to avoid costly metrics calculations unless they actually need them.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

aimalz commented 1 year ago

@yanzastro had a very relevant thought on metrics for pipelines where true redshifts are not available, namely a pairwise difference between qp.Ensemble.ancil arrays of point estimates where rows in Ensemble could be cells in SOM, for example, as per LSSTDESC/rail_som#1, once LSSTDESC/rail_som#2 produces photo-z PDF estimates per cell.

aimalz commented 1 year ago

Just catching this up with notes from the retreat before I get too far into implementing it in the corresponding branch: The plan is to define classes of metrics based on what input they take (the 3 logical pairings of point values and PDFs, where estimate is always first and truth/reference is always second). This will give users the opportunity to specify the subset of metrics they want to calculate as keyword config options while still maintaining a low number of output files of metrics (one per class), with the added bonus of knowing which metrics can be accessed through each class just based on what the paired input types are. Here's an outline of what's going to go where:

This plan also corresponds to some way to generate point estimates from PDFs as a standalone stage, thereby overcoming the restriction that the only way to make point estimates in a pipeline right now is to do it at the time of estimation.

(Note that this functionality has some implications for the formatting of point estimates calculated by an Estimator; @drewoldag and I will outline these in detail in a design document shortly.)

eacharles commented 1 year ago

I think you might consider keeping it all in a single class, and just giving it the option of comparing against truth or a distribution or both.On Aug 17, 2023, at 3:03 PM, Alex Malz @.***> wrote: Just catching this up with notes from the retreat before I get too far into implementing it in the corresponding branch: The plan is to define classes of metrics based on what input they take (the 3 logical pairings of point values and PDFs, where estimate is always first and truth/reference is always second). This will give users the opportunity to specify the subset of metrics they want to calculate as keyword config options while still maintaining a low number of output files of metrics (one per class), with the added bonus of knowing which metrics can be accessed through each class just based on what the paired input types are. Here's an outline of what's going to go where:

PointToPointEvaluator: bias, scatter, "robust" versions thereof, IQR, MAD, various "outlier rate" definitions ProbToProbEvaluator: KLD, KS, AD, CvM, Brier, EMD/Wasserstein ProbToPointEvaluator: CDE Loss, CRPS, PIT

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

joezuntz commented 11 months ago

Hi all. Alex asked me about stages that can have variable numbers of inputs. I'll have a think about this and get back to you.

ztq1996 commented 5 months ago

I want to revisit this next Monday (3/25), since the codebase has changed, the functionality of the current Evaluator class in src/evaluation/evaluator.py should be fulfilled by the single evaluator class in the eac branch, and we should be freeing up the superclass, and have metric evaluator classes based on that.

eacharles commented 5 months ago

done with #98