Closed wpumain closed 1 year ago
You can use the dataloader, but then it would continue until it runs out of data (or you would need to use a break after N iterations). Using an iterator is more dataset agnostic, and it is clearer how long the "epoch" is going to last (and using a full epoch over the whole dataset could take a day, so that is not an option). Anyway, doing this
for iteration, (images, targets, _) in tqdm(enumerate(dataloader), ncols=100, total=args.iterations_per_epoch):
if iteration >= args.iterations_per_epoch:
break
is equivalent to this
dataloader_iterator = iter(dataloader)
for iteration in tqdm(range(args.iterations_per_epoch), ncols=100):
images, targets, _ = next(dataloader_iterator)
I just find the second example cleaner than the first
https://github.com/gmberton/CosPlace/blob/804fc8c65871cab81f8efbc2c0cff04f1ab78534/train.py#L106
Why not use torch.utils.data.dataloader
directly?
It is also okay to use the standard DataLoader
class, but using an InfiniteDataLoader
ensures that if there are not enough samples the __iter__
will not raise a StopIteration
. This could happen when changing the parameters within the code (e.g. using higher values for parameters --iterations_per_epoch
, --batch_size
, --L
, --N
might lead to this issue when using a standard DataLoader
)
dataloader_iterator = iter(dataloader)
for iteration in tqdm(range(args.iterations_per_epoch), ncols=100):
images, targets, _ = next(dataloader_iterator)
What you mean is that with this approach, you can train on a portion of the data in each epoch without having to train on all the data, and only need to go through args.iterations_per_epoch
iterations?
https://github.com/gmberton/CosPlace/blob/543eaa5995c64d7e2ddbe2d8a1587ec4b2964728/train.py#L106
dataloader = commons.InfiniteDataLoader(groups[current_group_num], num_workers=args.num_workers,
batch_size=args.batch_size, shuffle=True,
pin_memory=(args.device == "cuda"), drop_last=True)
dataloader_iterator = iter(dataloader)
When I use the small
dataset, it has only one group containing 5,965 classes. If args.batch_size
is set to 8, there will be 745 BatchSamplers
in the dataloader, meaning the dataloader can iterate 745 times to train on all the data. The statement "images, targets, _ = next(dataloader_iterator)
" can only iterate 745 times as well. However, in the training loop, args.iterations_per_epoch
is set to 10,000, indicating 10,000 iterations are expected, but in reality, only 745 iterations are possible because the statement "images, targets, _ = next(dataloader_iterator)"
cannot iterate further after 745 iterations. If this is the case, your intention to train on only a portion of data in each epoch may not be achieved.
class InfiniteDataLoader(torch.utils.data.DataLoader):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dataset_iterator = super().__iter__()
def __iter__(self):
return self
def __next__(self):
try:
batch = next(self.dataset_iterator)
except StopIteration:
self.dataset_iterator = super().__iter__()
batch = next(self.dataset_iterator)
return batch
using an InfiniteDataLoader ensures that if there are not enough samples the iter will not raise a StopIteration.
When all samples are used up, the __iter__
will start iterating from the first batch again. Is that correct?
Whenargs.iterations_per_epoch=10,000
and the DataLoader
has 800
BatchSamplers
, after completing the iteration of the 800th batch, the statement "images, targets, _ = next(dataloader_iterator)"
in the for loop will start iterating from the first batch again. In this case, a single for training loop will train on the data multiple times, and eventually, one epoch will complete 10,000 iterations of training.
When args.iterations_per_epoch=10,000
and the DataLoader
has20,000 BatchSamplers
, in this case, a single for training loop will only perform 10,000 iterations of training, training only a portion of the data.
This ensures that regardless of whether the number of BatchSamplers is greater than args.iterations_per_epoch, the for training loop in one epoch will always iterate args.iterations_per_epoch times.
Right?
Based on this setup, training on8 groups
of the processed
dataset can lead to CosPlace achieving state-of-the-art performance?
All your assumptions are correct, and the answer to all your questions is yes.
Think you for your help
https://github.com/gmberton/CosPlace/blob/cea7de243abe1a59c69f1087b1b8001d01262c59/train.py#L114
Why not do it directly like this?