Closed Alkohole closed 7 months ago
--resume runs/train/exp4/weights/best.pt
Thank you very much!
I can't continue training with the --resume
flag, the script says training for 100 epochs is complete and there is nothing to continue:
Traceback (most recent call last):
File "/workspace/yolov9/train_dual.py", line 644, in <module>
main(opt)
File "/workspace/yolov9/train_dual.py", line 538, in main
train(opt.hyp, opt, device, callbacks)
File "/workspace/yolov9/train_dual.py", line 174, in train
best_fitness, start_epoch, epochs = smart_resume(ckpt, optimizer, ema, weights, epochs, resume)
File "/workspace/yolov9/utils/torch_utils.py", line 469, in smart_resume
assert start_epoch > 0, f'{weights} training to {epochs} epochs is finished, nothing to resume.\n' \
AssertionError: /workspace/yolov9/runs/train/exp4/weights/best.pt training to 100 epochs is finished, nothing to resume.
Start a new training without --resume, i.e. 'python train.py --weights /workspace/yolov9/runs/train/exp4/weights/best.pt'
Traceback (most recent call last):
File "/workspace/yolov9/train_dual.py", line 644, in <module>
main(opt)
File "/workspace/yolov9/train_dual.py", line 538, in main
train(opt.hyp, opt, device, callbacks)
File "/workspace/yolov9/train_dual.py", line 174, in train
best_fitness, start_epoch, epochs = smart_resume(ckpt, optimizer, ema, weights, epochs, resume)
File "/workspace/yolov9/utils/torch_utils.py", line 469, in smart_resume
assert start_epoch > 0, f'{weights} training to {epochs} epochs is finished, nothing to resume.\n' \
AssertionError: /workspace/yolov9/runs/train/exp4/weights/best.pt training to 100 epochs is finished, nothing to resume.
Start a new training without --resume, i.e. 'python train.py --weights /workspace/yolov9/runs/train/exp4/weights/best.pt'
The command looks like this:
python train_dual.py --workers 8 --batch 16 --img 640 --epochs 150 --data /workspace/data.yaml --resume /workspace/yolov9/runs/train/exp4/weights/best.pt --device 0 --cfg /workspace/yolov9/models/detect/yolov9_custom.yaml --hyp /workspace/yolov9/data/hyps/hyp.scratch-high.yaml
What am I doing wrong?
For transfer learning: --weights runs/train/exp4/weights/best.pt
For resume training: --resume runs/train/exp4/weights/best.pt
I can't continue training with the
--resume
flag, the script says training for 100 epochs is complete and there is nothing to continue:Traceback (most recent call last): File "/workspace/yolov9/train_dual.py", line 644, in <module> main(opt) File "/workspace/yolov9/train_dual.py", line 538, in main train(opt.hyp, opt, device, callbacks) File "/workspace/yolov9/train_dual.py", line 174, in train best_fitness, start_epoch, epochs = smart_resume(ckpt, optimizer, ema, weights, epochs, resume) File "/workspace/yolov9/utils/torch_utils.py", line 469, in smart_resume assert start_epoch > 0, f'{weights} training to {epochs} epochs is finished, nothing to resume.\n' \ AssertionError: /workspace/yolov9/runs/train/exp4/weights/best.pt training to 100 epochs is finished, nothing to resume. Start a new training without --resume, i.e. 'python train.py --weights /workspace/yolov9/runs/train/exp4/weights/best.pt' Traceback (most recent call last): File "/workspace/yolov9/train_dual.py", line 644, in <module> main(opt) File "/workspace/yolov9/train_dual.py", line 538, in main train(opt.hyp, opt, device, callbacks) File "/workspace/yolov9/train_dual.py", line 174, in train best_fitness, start_epoch, epochs = smart_resume(ckpt, optimizer, ema, weights, epochs, resume) File "/workspace/yolov9/utils/torch_utils.py", line 469, in smart_resume assert start_epoch > 0, f'{weights} training to {epochs} epochs is finished, nothing to resume.\n' \ AssertionError: /workspace/yolov9/runs/train/exp4/weights/best.pt training to 100 epochs is finished, nothing to resume. Start a new training without --resume, i.e. 'python train.py --weights /workspace/yolov9/runs/train/exp4/weights/best.pt'
The command looks like this:
python train_dual.py --workers 8 --batch 16 --img 640 --epochs 150 --data /workspace/data.yaml --resume /workspace/yolov9/runs/train/exp4/weights/best.pt --device 0 --cfg /workspace/yolov9/models/detect/yolov9_custom.yaml --hyp /workspace/yolov9/data/hyps/hyp.scratch-high.yaml
What am I doing wrong?
Full terminal response:
It means you have down your first training with 100 epochs. But now you want to do training for 150 epochs which cannot be done by resuming the first training since you set the epochs lesser. You need to start the training with 150 epochs from starting and then you can resume the training if it stops in between.
Aha, I understood, --resume
to resume an interrupted training session, not to continue a completed training session on new data.
Okay, thank you all for your help.
This worked for me : python train_dual.py --workers 1 --device cpu --batch 4 --data datasets/data.yaml --img 640 --cfg models/detect/yolov9-c.yaml --weights '' --name yolov9-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 10 --close-mosaic 15 --resume runs/train/yolov9-c/weights/last.pt
after 1 epoch witch worked on pc, and shtdowned pc. remaining epochs with above function was 4.
Hello,
Can I send the model to be re-trained on new dataset using your script?
Can I just have the
train_dual.py
script run with this flag--weights runs/train/exp4/weights/best.pt
?Thank you for your work, your attention and I apologize for my English)