Closed SamuelLarkin closed 2 weeks ago
Review changes with SemanticDiff.
Analyzed 1 of 1 files.
Filename | Status | |
---|---|---|
:heavy_check_mark: | everyvoice/base_cli/helpers.py | Analyzed |
CLI load time: 0:00.23
Pull Request HEAD: 7cce58cb74a59ca919153ce22f72e49f4ee64024
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 74.63%. Comparing base (
3a36240
) to head (7cce58c
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Yes , confirming that the fin-tune checkpoint it resuming from the end of the previous run. ( 50 steps ahead) VS how it was definitely overlapping before.
I will open a new ticket for the 50 steps ahead but will close this since it is now resolved. :-)
PR Goal?
Fix proper resuming of
text-to-spec
training. The state at the end of the last epoch wasn't saved and resuming would be performed from the last saved checkpoint that was the last checkpoint used for validation. This was producing staggered runs as shown intensorboard
.Fixes?
534
Feedback sought?
merge approval
Priority?
low
Tests added?
None
How to test?
Check the state of the loops
Which will yield something like the following. You want to look at
current
's values. This run used 11790 training examples split across batches of 16 examples thus, one epoch is 11790/16 ~ 736 batches per epoch. If, instead, we see 500, the defaultval_check_interval
, this would mean that we didn't save at the end of the epoch.Try resuming for a second epoch.
Use
tensorboard
and check that the second run's training is NOT staggered with your first run.Confidence?
Good
Version change?
No
Related PRs?
None