First of all, I would like to thank you for your outstanding work on integrating Swin Transformer with Mask R-CNN and providing the valuable resources and codebase to the community.
I have encountered some difficulties in achieving satisfactory training accuracy. My results are considerably lower than the ones in your paper. Here are a few key metrics from my training results:
Bounding Box Detection (bbox):
AP @ IoU=0.50:0.95 (all areas, maxDets=100): 0.181
AP @ IoU=0.50 (all areas, maxDets=100): 0.403
AP @ IoU=0.75 (all areas, maxDets=100): 0.128
AP @ IoU=0.50:0.95 (medium areas, maxDets=100): 0.191
AP @ IoU=0.50:0.95 (large areas, maxDets=100): 0.214
Segmentation (segm):
AP @ IoU=0.50:0.95 (all areas, maxDets=100): 0.104
First of all, I would like to thank you for your outstanding work on integrating Swin Transformer with Mask R-CNN and providing the valuable resources and codebase to the community.
I have encountered some difficulties in achieving satisfactory training accuracy. My results are considerably lower than the ones in your paper. Here are a few key metrics from my training results:
Bounding Box Detection (bbox):
Segmentation (segm):
My training configuration is as follows:
The availability of the pretrained weight file would greatly enhance the reproducibility of your results.