WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.98k stars 524 forks source link

Can't evolve hyp #58

Open WANGCHIENCHIH opened 2 years ago

WANGCHIENCHIH commented 2 years ago

i want to evolve my hyp with my data in mutigpu:

python -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data ../data.yaml --cfg ../data.cfg --weights '' --device 0,1,2,3 --sync-bn --name yolor_p6-v2 --epochs 15 --single-cls --evolve

and i got: Traceback (most recent call last): File "train.py", line 571, in <module> assert opt.local_rank == -1, 'DDP mode not implemented for --evolve' AssertionError: DDP mode not implemented for --evolve

and if change to :

python train.py --batch-size 32 --img 1280 1280 --data ../data.yaml --cfg ../data.cfg --weights '' --device 0,1,2,3 --name yolor_p6-v2 --epochs 15 --sync-bn --evolve --single-cls

i got :

Traceback (most recent call last): File "train.py", line 611, in <module> results = train(hyp.copy(), opt, device, wandb=wandb) File "train.py", line 288, in train loss, loss_items = compute_loss(pred, targets.to(device), model) # loss scaled by batch_size File "/workspace/yolor/utils/loss.py", line 66, in compute_loss tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets File "/workspace/yolor/utils/loss.py", line 145, in build_targets r = t[None, :, 4:6] / anchors[:, None] # wh ratio RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

WongKinYiu commented 2 years ago

as i remember, evolve can not run on multiple gpus.

WANGCHIENCHIH commented 2 years ago

ok,it seems work

python train.py --batch-size 32 --img 1280 1280 --data ../data.yaml --cfg ../data.cfg --weights '' --device --name yolor_p6-v2 --epochs 15 --evolve --single-cls

issue #19 may have the same question