how to get AP50 result for each classes as reported on the paper?

yujheli commented 2 years ago

Follow section 4.2 in the paper and set the same exact parameters (16 batch size) then you can get the results reported in the paper. Actually setting unsupervised weight as 0.5 or 0.25 can get even better results than we were reporting in the paper, which means that the performance of our model can be improved by sweeping more parameters.

tyunit commented 2 years ago

16 batch size is not working for a single GPU so is that mean I can't get ap50 results using a single GPU?

michaelku1 commented 2 years ago

16 batch size is not working for a single GPU so is that mean I can't get ap50 results using a single GPU?

You may try gradient accumulation. I trained the model with gradient accumulation and got close results.

tyunit commented 2 years ago

how can I do that can you elaborate it, please? shall I update the config file

michaelku1 commented 2 years ago

You should change the code in the trainer file such that for each step() loss.backward() is called for as many sub iterations needed as possible to reach the effective batch size and then you do the update all at once. For example, your effective batch size would be 16 and your gpu may only fit 4 images at max. You would change the batch size in the config file to 4 and then write a for loop that will iterate loss.backward() 4 times (4x4=16); then you do the update outside the for loop with all the gradients accumulated. Obviously, this is going to take more training time but it’s the only way when you have limited number of GPUs or memory size per GPU. As far as implementation, you should check out Pytorch’s example of gradient accumulation.

On Sun, Jun 5, 2022 at 9:20 PM, TYUnit @.***> wrote:

how can I do that can you elaborate it, please? shall I update the config file

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/adaptive_teacher/issues/15#issuecomment-1146803546, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALRMCWM73PO6DJE5SYLBE33VNSSSBANCNFSM5X4ICJ5Q . You are receiving this because you commented.Message ID: @.***>

tyunit commented 2 years ago

Thank you I really appreciate your detailed explanation. I take a look at the PyTorch examples but since I am not familiar with it before it isn't easy for me to make the update. so i appreciate if you can provide the updated trainer file or the code withe specific line where it can be replaced.

yujheli commented 2 years ago

@michaelku1 I really appreciate your help. I also just learned the knowledge of gradient accumulation.

michaelku1 commented 2 years ago

Thank you I really appreciate your detailed explanation. I take a look at the PyTorch examples but since I am not familiar with it before it isn't easy for me to make the update. so i appreciate if you can provide the updated trainer file or the code withe specific line where it can be replaced.

Though not 100% sure, but this implementation allowed me to train well on a single gpu with effective bs = 16

michaelku1 commented 2 years ago

@michaelku1 I really appreciate your help. I also just learned the knowledge of gradient accumulation.

However, even though gradients are accumulated, batchnorm statistics are not and this may lead to a slight discrepancy in performance between model trained using gradient accumulation and one that is not.

facebookresearch / adaptive_teacher

how to get AP50 result for each classes as reported on the paper? #15