Closed liyunlongaaa closed 2 years ago
Hi there,
Did the machine finish the first epoch? If so, you should be able to find the saved checkpoint in the experiment path. In addition, when you train with a large dataset (with more than 200k samples), the script also saves the optimizer states. https://github.com/YuanGongND/ast/blob/87a80043154eb4bb34ebceb4dc3e2d91a99235f4/src/traintest.py#L210-L216
The training progress is also saved at https://github.com/YuanGongND/ast/blob/87a80043154eb4bb34ebceb4dc3e2d91a99235f4/src/traintest.py#L39-L43
You should be able to use above and torch.load
and then torch.dataparallel
to load the model and continue training, but we do not have an interface for continue training in this repo.
For training with lower computational overhead, you could consider (1) fine-tune our audioset pretrained model on your dataset, please check the ESC-50 recipe, and/or (2) using a smaller/no overlap in patch split, i.e., setting fstride=16
and tstride=16
when you instantiate the AST model.
-Yuan
wow, thank you very much !! love you
Hi friend, Im a newer. For some reason, I can only train with a laptop, but halfway through the training, the computer restarts because the temperature is too high, how should I continue training? thank you for your help!