Improve code coverage - Githubissues

delucchi-cmu commented 1 year ago

First pass can focus on reducing the total number of lines of code:

reduce duplicated code
remove unused code

With existing unit test cases, we can likely get coverage up to around 40-50 percent just by shrinking the denominator (and should have tighter code organization as a result).

delucchi-cmu commented 1 year ago

This morning's inspection:

Seems unused

utils.get_sn_ra_dec
format_data_ztf
   .import_features_and_labels
   .return_names_from_med_arrays
   .divide_into_training_test_set
   .generate_two_class_labels
   .oversample_minority_classes
   .summarize_misc_classification
   .generate_csv_subset
   .generate_csv_subset2
   .add_snr_to_prob_csv

Potential duplicates

ztf_transient_fit.import_data very similar to similar to fit_numpyro.import_data

create new file_utils.read_single_lightcurve
same as ztf_transient_fit.import_data

import_ztf_from_alerce.get_band_extinctions same as utils.get_band_extinctions

remove import_ztf_from_alerce.get_band_extinctions

ztf_transient_fit.run_mcmc similar plotting.plot_lc_fit

add paramters to plot_lc_fit and remove ztf_transient_fit.run_mcmc

ztf_transient_fit.run_mcmc.flux_model same as utils.flux_model:

remove ztf_transient_fit.run_mcmc.flux_model

OliviaLynn commented 1 year ago

This looks great! I went through my notes and reviewed the things I'd flagged, and what isn't captured here turned out to be not explicitly obsolete.

mlp has a get_predictions(model, iterator, device) and a get_predictions_new(model, iterator, device), which return images, labels, sample_idxs, probs and images, probs, respectively. Both are used at least once, but would probably benefit from renaming.
- I'm not sure this quite fits the theme of this issue, so I'm happy to make a note to address this when I'll be looking at variable names later this week.
import_labels_only and import_features_and_labels (of format_data_ztf) both map groups of alternative label spellings (eg, "SN Ic", "SNIc-BL", and "21" all become "SN Ibc"), but the latter does not translate any number labels. It seems like an unexpected mismatch, and potentially the result of duplicated code only being further developed in one copy of the mapping.
- This relates to a comment Kaylee had made that the code translates string labels to int labels at some point before things get to the classifiers (as classify_ztf and mlp prefer int labels, iirc), and maybe wanting to do that translation earlier in the code (so labels remain consistent throughout). This could potentially benefit from a larger look at how we translate int and string labels back and forth in the code (maybe adding some util functions to avoid duplication?)

delucchi-cmu commented 1 year ago

Thanks! I was also playing around with possibly moving the string/int stuff into an enumeration, and I've opened issue #25 to continue that conversation.

VTDA-Group / superphot-plus

Improve code coverage #24

Seems unused

Potential duplicates