Open julianmack opened 4 years ago
I've followed all suggested changes except for the fact that there is an lr_scheduler
in master so I've kept this and treated accordingly.
Putting up for re-review.
Is the lr_scheduler in master? If it is can this PR be rebased on it (I can't see it!). If it isn't, can the lr_scheduler code be pushed to a different PR?
I think my first comment on the lr_scheduler
was unclear @samgd. There is already a lr_scheduler
in master but not added by me (added by Giuseppe a while back): https://github.com/MyrtleSoftware/myrtlespeech/blob/d4a0003197f4512eb4ca9b1cda74358456a77990/src/myrtlespeech/builders/task_config.py#L94
master does not have my lr_warmup
and polynomial_lr
changes: these are in an incoming PR (https://github.com/MyrtleSoftware/myrtlespeech/pull/22) and I agree that they should be considered separately!
I will respond to the rest of the above comments shortly but just flagging this high level point now
I think my first comment on the lr_scheduler was unclear @samgd. There is already a lr_scheduler in master but not added by me (added by Giuseppe a while back):
Ah I see now, thanks. The SeqToSeq
class not having an lr_scheduler
parameter in init or parameter documented in the docstring was what was throwing me off - I had forgotten it was being set in the builder. I'll make an issue for this as it should be explicit and obvious!
Responded to changes and added a test.
I've added a check_state_dicts_match
function in test.utils.utils
. Although this is currently used in just the new test, I've placed it here because I want to use it in another test in my incoming lr_warmup PR.
I've also disabled the builders/test_task_config.py
test as this functionality is necessarily tested in at the start of the new Saver
test.
Ah I see now, thanks. The
SeqToSeq
class not having anlr_scheduler
parameter in init or parameter documented in the docstring was what was throwing me off - I had forgotten it was being set in the builder. I'll make an issue for this as it should be explicit and obvious!
Also on the above point - in the lr_warmup
PR, I've refactored the lr_sheduler
creation and added a proper builder w. sphinx docs.
Responded to comments. A few outstanding discussions.
Small PR updating the
seq_to_seq
state dict to include the learning-rate scheduler and optimizer state so that crashed training runs can be resumed w/o disruption.I've also added a helper function that, given a state_dict filepath, loads the dict into a
seq_to_seq
instance and returns the training state necessary for theCallbackHandler
. It needs this state so that theSaver
callback can save with naming conventionstate_dict_x
and theTensorBoardLogger
can resume from the same step.I have not added tests for this functionality here - but I have added one in the lr scheduler PR: https://github.com/MyrtleSoftware/myrtlespeech/pull/22 (which is blocked by the
rnnt
PR). The reason for this is that the state_dict restoration is relatively simple in this case but much more involved once learning-rate warm-up is added.