Doubts on Noisy Student

anuj-sharma-19 commented 2 years ago

Hi,

First of all, thank you for the great work of putting different semi-supervised methods on lidar point clouds into a single repo.

I have a question on the Noisy Student training. From the Noisy Student config https://github.com/PointsCoder/ONCE_Benchmark/blob/master/tools/cfgs/once_models/semi_learning_models/noisy_student_second_large.yaml, it does not seem to add dropout DP_RATIO into the model. But the Noisy Student paper suggests to add it. Not sure if I am missing something?

Also, the Noisy Student training seems to be for only 1-cycle, instead of 3-cycles as originally done in the paper. Could you please let me know if the multiple cycle experiment lowered the performance compared to only 1-cycle?

On comparing Noisy Student to Pseudo Labels config, it appears the only difference between the 2 being random augmentations of random_world_flip and random_world_rotation are not applied to Student model in Pseudo Labels. Could you please confirm if that's the only difference between these?

Looking forward to your reply.

Thank You !! Anuj

PointsCoder commented 2 years ago

@anuj-sharma-19 Thanks for your attention to our work!

We turned off the dropout when generating labels, which is just the same as inference. We haven't conducted experiments on adding the dropout. Please let us know If you can attain a better result with the dropout added.
We kept 3-cycles to maintain a fair comparison with other approaches. We didn't explore the effects of different training cycles.
There are actually 2 differences. The first difference is as you mentioned Pseudo Labels have NO augmentation, which will lead to a much worse result. The second difference is that in Noisy Student we replace the teacher with the new student after the first round of S-T semi-training, which can be done by simply changing the loading checkpoint.

anuj-sharma-19 commented 2 years ago

Hi,

Thanks for your quick response and the clarifications!!

So, to confirm, in the code, the Teacher does not use dropout, i.e. runs in inference mode, and even the Student does not use dropout.
Noisy Student is trained for 3 cycles, with Teacher being replaced by Student from the previous cycle. Are the methods, i.e. Mean Teacher, SESS, 3D-IoU-Match also trained for 3 cycles similarly, or is it just the Noisy Student one? If not, then does it mean that Noisy Student is effectively trained for 3 x 150 epochs, whereas the rest are trained for only 150 epochs?

Thanks !! Anuj

PointsCoder commented 2 years ago

@anuj-sharma-19

The strategy of using training/eval mode is in semi_train.py, you can check the dropout here:
Noisy student is different from other methods, we actually train it with 150x2 epochs by replacing the teacher with the student for one time. The rest are trained for 150 epochs.

anuj-sharma-19 commented 2 years ago

Okay, thanks a lot for the clarifications. :+1:

PointsCoder / ONCE_Benchmark

Doubts on Noisy Student #16