-
-
-
-
Thanks for your work! I have some questions about model distillation.
"we leverage the same training loop with a few exceptions: we use a larger
model as a frozen teacher, keep a spare EMA of the st…
-
The original codes looks like below:
kl_loss = F.kl_div(F.log_softmax(
student_logits, dim=-1), targets, reduction='batchmean')
Although the relative loss curve is the…
-
Hi, respect for your awesome work! I have a question about the training. In backtracking stage, the generator's timestep is fixed to 399, and the timesteps of student and teacher are randomly sampled …
-
-
-
-
Is self-distillation only implemented for image processing model? videomamba_distill.py is only in the image_sm/models folder. I would like to use videoMamba as a backbone for video 3D pose estimation…