asyml / texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
https://asyml.io
Apache License 2.0
744 stars 118 forks source link

Fix bugs in Executor #306

Closed huzecong closed 4 years ago

huzecong commented 4 years ago

This PR fixes a bunch of bugs in the Executor module:

Missing call to tracker in _validate_loop

This is so stupid: for some reason I forgot to call _valid_tracker.add in _validate_loop, so the status is never updated during validation.

Files closed prematurely when test is called in train.

_open_files and _close_files are called at the beginning and end of train and test, to prevent holding on to an open file object for an unnecessarily long amount of time.

However, it's possible that we call test within train. For instance, calling test in a action triggered by the validation event. In this case, the file will be closed before training ends.

Solution: Check whether we need to open files, and if we don't, then don't open or close them.

Saved meta-info contains large bulks of data

The saved meta-info in the checkpoints directory contains the training and evaluation metrics at the time of save, but we were actually saving the metric objects. This is unnecessary as we only need the values.

What's more is that some metric objects stores reference to large objects. For instance, the LR metric stored a reference to the optimizer, which holds a bunch of weights. This made the meta-info even larger in size than the checkpoints.

Solution:

  1. Save only metric values.
  2. Store the optimizer as a weakref in LR.
codecov[bot] commented 4 years ago

Codecov Report

Merging #306 into master will increase coverage by 0.09%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #306      +/-   ##
==========================================
+ Coverage   79.82%   79.91%   +0.09%     
==========================================
  Files         133      133              
  Lines       11122    11135      +13     
==========================================
+ Hits         8878     8899      +21     
+ Misses       2244     2236       -8     
Impacted Files Coverage Δ
texar/torch/run/executor.py 79.08% <100.00%> (+0.72%) :arrow_up:
texar/torch/run/metric/summary.py 92.95% <100.00%> (+8.10%) :arrow_up:
texar/torch/run/executor_utils.py 83.52% <0.00%> (-0.39%) :arrow_down:
texar/torch/run/metric/base_metric.py 75.00% <0.00%> (+1.31%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 27fe398...16806ae. Read the comment docs.