amdegroot / ssd.pytorch

A PyTorch Implementation of Single Shot MultiBox Detector
MIT License
5.13k stars 1.74k forks source link

RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0 #173

Open 17764591637 opened 6 years ago

17764591637 commented 6 years ago

rps@rps:~/桌面/ssd.pytorch$ python3 train.py /home/rps/桌面/ssd.pytorch/ssd.py:34: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. self.priors = Variable(self.priorbox.forward(), volatile=True) /home/rps/桌面/ssd.pytorch/layers/modules/l2norm.py:17: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_. init.constant(self.weight,self.gamma) Loading base network... Initializing weights... train.py:214: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavieruniform. init.xavier_uniform(param) Loading the dataset... Training SSD on: VOC0712 Using the specified args: Namespace(basenet='vgg16_reducedfc.pth', batch_size=32, cuda=True, dataset='VOC', dataset_root='/home/rps/data/VOCdevkit/', gamma=0.1, lr=0.001, momentum=0.9, num_workers=4, resume=None, save_folder='weights/', start_iter=0, visdom=False, weight_decay=0.0005) train.py:169: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead. targets = [Variable(ann.cuda(), volatile=True) for ann in targets] Traceback (most recent call last): File "train.py", line 255, in train() File "train.py", line 178, in train loss_l, loss_c = criterion(out, targets) File "/home/rps/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/home/rps/桌面/ssd.pytorch/layers/modules/multibox_loss.py", line 97, in forward loss_c[pos] = 0 # filter out pos boxes for now RuntimeError: The shape of the mask [32, 8732] at index 0 does not match the shape of the indexed tensor [279424, 1] at index 0

anyone helps,please...

knotgrass commented 3 years ago

if loss is nan,maybe the learning_rate is too large.

or batch_size is too small or both

EsakaK commented 2 years ago

There is still a problem. In step 1, it should be changer like this:

N = num_pos.data.sum().double()
loss_l = loss_l.double()/N
loss_c = loss_c.double()/N

otherwise the loss will be a 'nan'.

sonukiller commented 1 year ago

If you are using PyTorch 2, please follow this:

1) In multibox_loss.py, Swap line no. 97 and 98

2) In trainer.py, Line no. ~183: replace loc_loss += loss_l.data[0] with loc_loss += loss_l.item() Line no. ~184: replace conf_loss += loss_c.data[0] with conf_loss += loss_c.item() Line no. ~188 in print, replace loss.data[0] with loss.item()

This solved my problem!

zuliani99 commented 6 months ago

@sonukiller I'm still getting nan loss even with your suggestion and the previous one.

Do you suggest t remove all the .data attribute and substitute Variable with classic torch.tensor?