Open bobo0810 opened 6 years ago
You should reload data. what I mean is that you should copy
data_loader = data.DataLoader(dataset, args.batch_size, num_workers=args.num_workers, shuffle=True, collate_fn=detection_collate, pin_memory=True)
to the except.
As @bobo0810 mentioned, this bug is because the batch iterator run through the whole dataset eventually. So, I guess this means that it happens after exactly one epoch? Do I miss something here?
I just wonder why the developers did not implement an approach like for epoch in range(num_epochs): ...
. Wouldn't that make better sense?
@visor2020 After testing, in fact, do not need to reload the data set。
@bobo0810 however, the weights is not trained and saved by yourself. Is this a fact?
You can see the code in line 122 of voc0712.py, return len(self.ids)
, the iterator only iterate once, so there are two methods to solve this problem:
1.replacing iterator with the form mentioned by @chi0tzp
2.modifying the function len() in voc0712.py with def __len__(self): return self.total_images
the self.total_images is (cfg['max_iter'] - args.start_iter) * args.batch_size
@bobo0810 you should change batch_iterator = iter(data_loader) to batch_iterator = None and then add this to the beginning of the for loop: if (not batch_iterator) or (iteration % epoch_size == 0):
batch_iterator = iter(data_loader)
@bobo0810 @visor2020 hey! I have a question in train.py ,why there sometimes use 'ssd_net' sometimes 'net': ssd_net = build_ssd('train', cfg['min_dim'], cfg['num_classes']) net = ssd_net
if args.cuda:
net = torch.nn.DataParallel(ssd_net)
cudnn.benchmark = True
if args.resume:
print('Resuming training, loading {}...'.format(args.resume))
ssd_net.load_weights(args.resume)
else:
vgg_weights = torch.load(args.save_folder + args.basenet)
print('Loading base network...')
ssd_net.vgg.load_state_dict(vgg_weights) #
if args.cuda:
net = net.cuda()
if not args.resume:
print('Initializing weights...')
# initialize newly added layers' weights with xavier method
ssd_net.extras.apply(weights_init)
ssd_net.loc.apply(weights_init)
ssd_net.conf.apply(weights_init)
optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=args.momentum,
weight_decay=args.weight_decay)
criterion = MultiBoxLoss(cfg['num_classes'], 0.5, True, 0, True, 3, 0.5,
False, args.cuda)
I can't understand it,can you help me ? thank you
@hust-kevin torch.nn.DataParallel return a new model for multi-gpu
DataParallel
So why not use "ssd_net = torch.nn.DataParallel(ssd_net)"
Comments are quite wonderful and usefull.
In the 165 or so lines of code: Images,target = next(batch_iterator)
Bug Description: The above code can only read the data set once. After reading through the data set, the program will stop.
Solution: The code is changed to: # load train data Try: Images,target = next(batch_iterator) Except StopIteration: #Start a new iteration Batch_iterator = iter(data_loader) Images,target = next(batch_iterator)