Open aidevmin opened 10 months ago
Hi, thank you for being interested in our paper. We do not really need a spatial attention mask for regression because the regression loss is only applied to foreground areas. For coupled heads, I think you may just try distilling your model following the classification distillation strategy.
@ChenhongyiYang Thank you for suggestion. I aslo thought use kd loss as the classification strategy.
Did you try any experiments of PGD with coupled head? In this case we don't seperated classification and regression head. I want to apply your method to my model with coupled head. In this case I need to modify loss function, mask. Why do you spatial attention mask for classification task and not for regression task?