Training Time - Githubissues

TL;DR: we take 1 additional forward pass for each training step.

Normal training process: forward + backward Unbiased Teacher's training process: forward on label data, forward on unlabeled data, forward on unlabeled data again.

Generally speaking, because we have extra unlabeled data, we need two extra forward passes. But, note that the unlabeled data is considered as extra data above, it will still need one extra forward pass even if it's labeled (think of the unlabeled dataset as an additional dataset). Thus, in short, we take 1 additional forward pass each training step.

Total training length (number of training steps):

In our experiments on MS-COCO using 120k additional unlabeled images, we surpass the supervised baseline that ran for 3x (270k iterations, 1x=90k iterations), even when we only ran for 2x (180k iterations). This indicates that we could reach the same performance to supervised baseline while training shorter. The performance kept improving as we trained longer. In the paper (arXiv), we report numbers for 3X, but we see it continue to improve at 4x.

Can it get faster?

Unbiased Teacher can resume from a previous checkpoint. Say, you have a model (best checkpoint you got) that is currently running for production, we can start Unbiased Teacher training from that checkpoint and start to leverage unlabeled data. Using the above scenario on MS-COCO as an example, we can actually start with a supervised baseline model trained already for 3X. You should immediately see performance increases in this scenario when training with additional unlabeled data.

How long should I train?

As stated above, if the performance can immediately grow, how long should we train? It depends on the rooms for improvement from your baseline model trained with labeled data, depends on the size of unlabeled dataset, etc. For instance, if the amount of labeled data is small and the unlabeled data large, you might want to train longer with unlabeled data since the supervised model could perform poorly given that it was trained with a small amount of data.

facebookresearch / unbiased-teacher

Training Time #1

TL;DR: we take 1 additional forward pass for each training step.

Total training length (number of training steps):

Can it get faster?

How long should I train?