Resuming training from checkpoint

Hi, by default the model save checkpoint automatically under the output directory (--output_dir). For example, you may see:

-- model/
    | -- checkpoint-10000/
    | -- runs/

Here checkpoint-10000/ folder contains all the necessary files to resume a training (if the training has been accidentally interrupted). To resume the training, all you need to do us to pass the path to --pretrain augment. For example:

t5chem train --data_dir data/sample/product/ --output_dir model/ --pretrain model/checkpoint-10000/ --task_type product --num_epoch 30

Then the training will be resumed from the 10000th step. Hope it helps!

HelloJocelynLu / t5chem

Resuming training from checkpoint #6