facebookresearch / unbiased-teacher

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection
https://arxiv.org/abs/2102.09480
MIT License
410 stars 82 forks source link

Data loader implementation issue #18

Closed vlfom closed 3 years ago

vlfom commented 3 years ago

Thanks a lot for releasing the code publicly, it is very helpful and clean.

I have a small doubt about implementation of the data loader, specifically, this portion of code:

It seems to me that because the batches for labeled and unlabeled images are filled separately and you loop through both datasets simultaneously using zip, sooner or later some images will be "skipped" from training.

There seem to exist theoretical worst cases where this issue would cause half of the images in both datasets to be skipped, but in reality, when the numbers of images with either aspect ratios are almost equal (?), and because the data loaders loop through images infinitely, this probably has a negligible impact on experiments (?). Just wanted to bring this up or please let me know if I'm wrong.

One idea to properly implement this is to use two separate iterators that lazily fetch images from datasets individually.

ycliu93 commented 3 years ago

Thanks @vlfom! That's indeed a better way to iterate labeled and unlabeled data!