Ayews / M3Net

The implementation of 'M3Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection'.
MIT License
8 stars 4 forks source link

The pretrained backbone weights #4

Closed lxpqxl closed 1 year ago

lxpqxl commented 1 year ago

Dear author:

I would like to express my sincere gratitude for sharing your code and paper openly.

I have a specific question regarding the pre-training weights of ResNet50 and Swin-Transformer. Were the pre-training weights of these models trained on ImageNet1k? I have observed some discrepancies between ResNet50 and the implementation in torchvision.

Thank you for your time and consideration. I greatly appreciate your contribution to the research community and look forward to your response.

Ayews commented 1 year ago

Thank you for your interest in our work.

In fact, the code implementation of ResNet50 that we provided and the pre-training weights are derived from previous SOD (salient object detection) work. We cannot confirm whether these weights were trained on ImageNet1k. However, we believe that this discrepancy has minimal impact. Using the ResNet50 code implementation from torchvision and the pre-training weights downloaded from the PyTorch website should yield results close to ours. It is important to note that we have made some modifications to the forward method to obtain feature maps from different levels of the backbone and removed the final fully connected layer, which is quite common in end-to-end tasks.

As for SwinTransformer, we used the pre-training weights released by its authors, trained on imagenet22k. Similarly, we made slight modifications to the forward method to obtain feature maps of different scales.

If you encounter any other issues, please feel free to contact us.

lxpqxl commented 1 year ago

Thank you for your interest in our work.

In fact, the code implementation of ResNet50 that we provided and the pre-training weights are derived from previous SOD (salient object detection) work. We cannot confirm whether these weights were trained on ImageNet1k. However, we believe that this discrepancy has minimal impact. Using the ResNet50 code implementation from torchvision and the pre-training weights downloaded from the PyTorch website should yield results close to ours. It is important to note that we have made some modifications to the forward method to obtain feature maps from different levels of the backbone and removed the final fully connected layer, which is quite common in end-to-end tasks.

As for SwinTransformer, we used the pre-training weights released by its authors, trained on imagenet22k. Similarly, we made slight modifications to the forward method to obtain feature maps of different scales.

If you encounter any other issues, please feel free to contact us.

Thank you for answering my questions in time.

I have no other questions. I wish the author better achievements!