Closed delucchi-cmu closed 1 year ago
This morning's inspection:
utils.get_sn_ra_dec
format_data_ztf
.import_features_and_labels
.return_names_from_med_arrays
.divide_into_training_test_set
.generate_two_class_labels
.oversample_minority_classes
.summarize_misc_classification
.generate_csv_subset
.generate_csv_subset2
.add_snr_to_prob_csv
ztf_transient_fit.import_data very similar to similar to fit_numpyro.import_data
import_ztf_from_alerce.get_band_extinctions same as utils.get_band_extinctions
ztf_transient_fit.run_mcmc similar plotting.plot_lc_fit
ztf_transient_fit.run_mcmc.flux_model same as utils.flux_model:
This looks great! I went through my notes and reviewed the things I'd flagged, and what isn't captured here turned out to be not explicitly obsolete.
get_predictions(model, iterator, device)
and a get_predictions_new(model, iterator, device)
, which return images, labels, sample_idxs, probs
and images, probs
, respectively. Both are used at least once, but would probably benefit from renaming.
import_labels_only
and import_features_and_labels
(of format_data_ztf) both map groups of alternative label spellings (eg, "SN Ic", "SNIc-BL", and "21" all become "SN Ibc"), but the latter does not translate any number labels. It seems like an unexpected mismatch, and potentially the result of duplicated code only being further developed in one copy of the mapping.
string
labels to int
labels at some point before things get to the classifiers (as classify_ztf and mlp prefer int labels, iirc), and maybe wanting to do that translation earlier in the code (so labels remain consistent throughout). This could potentially benefit from a larger look at how we translate int and string labels back and forth in the code (maybe adding some util functions to avoid duplication?)Thanks! I was also playing around with possibly moving the string/int stuff into an enumeration, and I've opened issue #25 to continue that conversation.
First pass can focus on reducing the total number of lines of code:
With existing unit test cases, we can likely get coverage up to around 40-50 percent just by shrinking the denominator (and should have tighter code organization as a result).