InfluenceFunctional / MXtalTools

BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

crash error in discriminator training #73

Closed InfluenceFunctional closed 1 year ago

InfluenceFunctional commented 1 year ago

after many OOM epochs get the following from the reporting


Traceback (most recent call last):
  File "/scratch/mk8347/mcrygan/main.py", line 31, in <module>
    predictor.train_crystal_models()
  File "/scratch/mk8347/mcrygan/crystal_modeller.py", line 528, in train_crystal_models
    self.logger.log_epoch_analysis(test_loader)
  File "/scratch/mk8347/mcrygan/reporting/logger.py", line 174, in log_epoch_analysis
    detailed_reporting(self.config, self.dataDims, test_loader, self.test_stats, extra_test_dict=self.extra_stats)
  File "/scratch/mk8347/mcrygan/reporting/online.py", line 1349, in detailed_reporting
    discriminator_BT_reporting(dataDims, wandb, test_epoch_stats_dict, extra_test_dict)
  File "/scratch/mk8347/mcrygan/reporting/online.py", line 917, in discriminator_BT_reporting
    make_and_plot_BT_figs(crystals_for_targets, target_identifiers_inds, identifiers_list,
  File "/scratch/mk8347/mcrygan/reporting/online.py", line 847, in make_and_plot_BT_figs
    fig = make_correlates_plot(tracking_features, scores_dict['CSD'], dataDims)
  File "/scratch/mk8347/mcrygan/reporting/online.py", line 1321, in make_correlates_plot
    fig.update_xaxes(range=[np.amin(list(g_loss_dict.values())), np.amax(list(g_loss_dict.values()))])
  File "<__array_function__ internals>", line 200, in amin
  File "/scratch/mk8347/crystal_learning/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2946, in amin
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
  File "/scratch/mk8347/crystal_learning/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
InfluenceFunctional commented 1 year ago

distinct from the issue of the OOM cascade, it seems like it is missing some reporting from an earlier epoch

InfluenceFunctional commented 1 year ago

actually the g_loss_dict is just empty

InfluenceFunctional commented 1 year ago

fixed