Closed zdgithub closed 1 year ago
@zdgithub The overall pipleline is 1) get a pre-trained model; 2) fine-tune it by inserting scale and shift factors, and modifying the head layer to adapt to the current task; 3) re-parameterization for model inference. During the re-parameterization, the scale and shift factors are absorbed into the backbone. In your case, the head layer is for 100 classes. Except for the head layer to adapt to the current task, other parts do not need to be modified. Thanks for your pointing this description. I will revised it to remove ambiguity.
If the pre-trained model was trained on 1000 classes,then we fine-tune it on a downstream task with 100 classes. However, during inference, the paper says that we should use the frozen pre-trained model with the head layer of 1000 classes (without network architecture modification), does it?