Open alexeifigueroa opened 3 years ago
We can add or (t_total == global_step)
at L239 to run the checkpoint/validation at the end of training.
The saving is happening inside this clause https://github.com/facebookresearch/bio-lm/blob/552de5c98fc421758f753d73162b9c84f0e755b2/biolm/run_classification.py#L248 if the score isn't better again the model won't be saved
if say, the save steps are greater than the total global steps the fine tuned models are not saved https://github.com/facebookresearch/bio-lm/blob/552de5c98fc421758f753d73162b9c84f0e755b2/biolm/run_classification.py#L239 and the code crashes https://github.com/facebookresearch/bio-lm/blob/552de5c98fc421758f753d73162b9c84f0e755b2/biolm/run_classification.py#L713 The save steps always have to be set to a smaller number and there's never a "final" version of the model when the training is 100% complete (unless you set the save steps to a divisor of the total steps if you happen to know them beforehand).