HelloJocelynLu / t5chem

Transformer-based model for chemical reactions
MIT License
58 stars 14 forks source link

Resuming training from checkpoint #6

Closed ruslankotl closed 1 year ago

ruslankotl commented 1 year ago

Hi,

I was wondering if the T5Chem CLI had a provision to resume training from a checkpoint?

HelloJocelynLu commented 1 year ago

Hi, by default the model save checkpoint automatically under the output directory (--output_dir). For example, you may see:

-- model/
    | -- checkpoint-10000/
    | -- runs/

Here checkpoint-10000/ folder contains all the necessary files to resume a training (if the training has been accidentally interrupted). To resume the training, all you need to do us to pass the path to --pretrain augment. For example:

t5chem train --data_dir data/sample/product/ --output_dir model/ --pretrain model/checkpoint-10000/ --task_type product --num_epoch 30

Then the training will be resumed from the 10000th step. Hope it helps!