training schedule logic

williamhyin commented 3 years ago

Hi,

I am confused about your training schedule logic in branch "paper".

`python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg models/yolor-p6.yaml --weights '' --sync-bn --device 0,1,2,3,4,5,6,7 --name yolor-p6 --hyp hyp.scratch.1280.yaml --epochs 300

python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 tune.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg models/yolor-p6.yaml --weights 'runs/train/yolor-p6/weights/last_298.pt' --sync-bn --device 0,1,2,3,4,5,6,7 --name yolor-p6-tune --hyp hyp.finetune.1280.yaml --epochs 450

python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg models/yolor-p6.yaml --weights 'runs/train/yolor-p6-tune/weights/epoch_424.pt' --sync-bn --device 0,1,2,3,4,5,6,7 --name yolor-p6-fine --hyp hyp.finetune.1280.yaml --epochs 450`

In third step, why you choose the epoch_424, and train only bis 450 epochs(only train remain 26 epochs )? why not chose the best finetung epoch?

WongKinYiu commented 3 years ago

hmm... it just because modify the code of dataloader to change data augmentation policy in the training process need additional effort. if you could, you can stop second training step when 424th epoch finish training.

williamhyin commented 3 years ago

hmm... it just because modify the code of dataloader to change data augmentation policy in the training process need additional effort. if you could, you can stop second training step when 424th epoch finish training.

Thanks, maybe a training tutorial is much suitable for understanding your idea!

WongKinYiu commented 3 years ago

https://github.com/WongKinYiu/yolor/blob/main/figure/schedule.png

WongKinYiu / yolor

training schedule logic #31