MathGaron / pytorch_toolbox

Boiler plate code for pytorch. Train/Validation loops, visualization etc. For research.
MIT License
10 stars 3 forks source link

Broken after some epochs #24

Closed jacenfox closed 5 years ago

jacenfox commented 6 years ago

https://github.com/MathGaron/pytorch_toolbox/blob/7aaca9faa0738f7b6c464f3c81757fe9fbebfcc4/pytorch_toolbox/visualization/tensorboard_logger.py#L121 This is an issue from numpy when computing the histograms. I guees it's because of the bins=auto. Didn't check. Sometimes it works pretty well, some times it's just failed.

MathGaron commented 6 years ago

either you give me information about how to reproduce the error, or you fix it :)

jacenfox commented 6 years ago
Traceback (most recent call last):
  File "train_pv_torch.py", line 199, in main
    train_loop_handler.loop(n_epochs, model_dir, load_last_checkpoint=resume, save_best_checkpoint=True, save_last_checkpoint=True, save_all_checkpoints=False)
  File "../../pytorch_toolbox/pytorch_toolbox/train_loop.py", line 264, in loop
    self.tensorboard_logger.histo_summary(tag + '/grad', self.to_np(value.grad), epoch + 1)
  File "../../pytorch_toolbox/pytorch_toolbox/visualization/tensorboard_logger.py", line 125, in histo_summary
    self.writer_train.add_histogram(tag, values, step, bins=bins)
  File "/gel/usr/jizha16/kekek/pyvenv3/lib/python3.5/site-packages/tensorboardX/writer.py", line 324, in add_histogram
    self.file_writer.add_summary(histogram(tag, values, bins), global_step)
  File "/gel/usr/jizha16/kekek/pyvenv3/lib/python3.5/site-packages/tensorboardX/summary.py", line 112, in histogram
    hist = make_histogram(values.astype(float), bins)
  File "/gel/usr/jizha16/kekek/pyvenv3/lib/python3.5/site-packages/tensorboardX/summary.py", line 119, in make_histogram
    counts, limits = np.histogram(values, bins=bins)
  File "/gel/usr/jizha16/kekek/pyvenv3/lib/python3.5/site-packages/numpy/lib/function_base.py", line 737, in histogram
    first_edge, last_edge, n_equal_bins + 1, endpoint=True)
  File "/gel/usr/jizha16/kekek/pyvenv3/lib/python3.5/site-packages/numpy/core/function_base.py", line 115, in linspace
    y = _nx.arange(0, num, dtype=dt)
ValueError: Maximum allowed size exceeded
jacenfox commented 6 years ago

Copy paste error. It's not happen all the time. I can fix it, later this week. If you want to use tbX, go for it. It works!