Fix bugs in Executor - Githubissues

asyml / texar-pytorch

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Apache License 2.0

744 stars 118 forks source link

This PR fixes a bunch of bugs in the Executor module:

Missing call to tracker in `_validate_loop`

This is so stupid: for some reason I forgot to call _valid_tracker.add in _validate_loop, so the status is never updated during validation.

Files closed prematurely when `test` is called in `train`.

_open_files and _close_files are called at the beginning and end of train and test, to prevent holding on to an open file object for an unnecessarily long amount of time.

However, it's possible that we call test within train. For instance, calling test in a action triggered by the validation event. In this case, the file will be closed before training ends.

Solution: Check whether we need to open files, and if we don't, then don't open or close them.

Saved `meta-info` contains large bulks of data

The saved meta-info in the checkpoints directory contains the training and evaluation metrics at the time of save, but we were actually saving the metric objects. This is unnecessary as we only need the values.

What's more is that some metric objects stores reference to large objects. For instance, the LR metric stored a reference to the optimizer, which holds a bunch of weights. This made the meta-info even larger in size than the checkpoints.

Solution:

Save only metric values.
Store the optimizer as a weakref in LR.

Codecov Report

Merging #306 into master will increase coverage by 0.09%. The diff coverage is 100.00%.

@@ Coverage Diff @@ ## master #306 +/- ## ========================================== + Coverage 79.82% 79.91% +0.09% ========================================== Files 133 133 Lines 11122 11135 +13 ========================================== + Hits 8878 8899 +21 + Misses 2244 2236 -8

Impacted Files	Coverage Δ
texar/torch/run/executor.py	`79.08% <100.00%> (+0.72%)`	:arrow_up:
texar/torch/run/metric/summary.py	`92.95% <100.00%> (+8.10%)`	:arrow_up:
texar/torch/run/executor_utils.py	`83.52% <0.00%> (-0.39%)`	:arrow_down:
texar/torch/run/metric/base_metric.py	`75.00% <0.00%> (+1.31%)`	:arrow_up:

Impacted Files

Coverage Δ

texar/torch/run/executor.py

79.08% <100.00%> (+0.72%)

:arrow_up:

texar/torch/run/metric/summary.py

92.95% <100.00%> (+8.10%)

:arrow_up:

texar/torch/run/executor_utils.py

83.52% <0.00%> (-0.39%)

:arrow_down:

texar/torch/run/metric/base_metric.py

75.00% <0.00%> (+1.31%)

:arrow_up:

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 27fe398...16806ae. Read the comment docs.

asyml / texar-pytorch

Fix bugs in Executor #306

Missing call to tracker in _validate_loop

Files closed prematurely when test is called in train.

Saved meta-info contains large bulks of data

Codecov Report

Missing call to tracker in `_validate_loop`

Files closed prematurely when `test` is called in `train`.

Saved `meta-info` contains large bulks of data