I used MSE loss and decoupled distillation just like the paper, but the results are very poor. I would like to know the specific parameter settings such as learning rate, and whether all layers of the image encoder were distilled during distillation. thanks~
I used MSE loss and decoupled distillation just like the paper, but the results are very poor. I would like to know the specific parameter settings such as learning rate, and whether all layers of the image encoder were distilled during distillation. thanks~