Closed joselotl closed 7 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 96.78%. Comparing base (
0b731ec
) to head (f374463
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I did a small change to the evaluator.py so that we would use catalogs where z_truth is inside an hdf5 group like ['photometry']
I think what most of the estimator codes do is read in the training data as a whole and then grab the redshift from that data in a separate line. But, this doesn't cover the case where the redshift column might be in a different hdf5 group, should we be worried about that at all?
The other thing is, with the code in this PR, if there happened to be a column with name self.config.redshift_col
in the top level hdf5, then the try
statement would grab that even if the intended redshift_col was in the sub-group. I think maybe the better thing to do would be what we do in several of the estimator codes, e.g. this from rail_flexzboost:
if self.config.hdf5_groupname:
training_data = self.get_data('input')[self.config.hdf5_groupname]
else: # pragma: no cover
training_data = self.get_data('input')
speczs = np.array(training_data[self.config['redshift_col']])
@sschmidt23 If I try to use self.config.hdf5_groupname I get an error when it was not previously defined. Or if I add
hdf5_groupname=SHARED_PARAMS['hdf5_groupname']
to the config_options, the test asigns the value "photometry"
when I want it to have a default value of None
I just pushed a change where to get the test to pass I set an hdf5_groupname=""
in the test's make_stage
setup. If you want a default value of hdf5_groupname
of None
or ""
, then maybe we could define a config parameter with a different name from hdf5_groupname, maybe truth_groupname
or something like that to distinguish it from the hdf5_groupname
parameter that we use for the estimators, and then we can set that config parameter to have whatever default we want. I guess it's really a question of whether we think that the truth data is always going to be read from files that follow the same patterns of having and hdf5_groupname that is the same as the data from the estimators. If so, hdf5_groupname
is probably fine, if we think it will occasionally be different, then maybe a unique config param is needed.
Ok, I'm fine with doing it this way, I'll approve the PR, if you think about things further and think we need a "truth_groupname" we can add that in a future PR.
Problem & Solution Description (including issue #)
If the spec-z file has truth-z inside an h5 group (for example ['photometry']['z_true']) the evaluator needs to use hdf5_groupname to access this row.
Code Quality
#pragma: no cover
; in the case of a bugfix, a new test that breaks as a result of the bug has been added