This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.
I trained Swin Transformers Cityscapes Instance Segmentation from pretrained Swin Base on ImageNet 22K.
During training, evaluations are very normal.
E.g:
2021-06-29 00:57:35,334 - mmdet - INFO - Evaluating segm...
Loading and preparing results...
DONE (t=0.44s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=7.74s).
Accumulating evaluation results...
DONE (t=0.63s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.339
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.590
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.132
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.317
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.530
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.412
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.412
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.412
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.182
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.386
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.620
However, after testing (test.py) using the epoch_*.pth checkpoint file, all the AP and AR are -1. If I export all of the predictions the images are all black.
This is the testing output:
apex is not installed
apex is not installed
apex is not installed
apex is not installed
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Use load_from_local loader
[ ] 0/1525, elapsed: 0s, ETA:/usr/local/lib/python3.7/dist-packages/torch/utils/checkpoint.py:25: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[>>] 1525/1525, 1.4 task/s, elapsed: 1085s, ETA: 0sloading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Evaluating bbox...
Loading and preparing results...
DONE (t=0.73s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=2.28s).
Accumulating evaluation results...
DONE (t=0.55s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = -1.000
Evaluating segm...
Loading and preparing results...
DONE (t=0.71s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=2.37s).
Accumulating evaluation results...
DONE (t=0.56s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = -1.000
{'bbox_mAP': -1.0, 'bbox_mAP_50': -1.0, 'bbox_mAP_75': -1.0, 'bbox_mAP_s': -1.0, 'bbox_mAP_m': -1.0, 'bbox_mAP_l': -1.0, 'bbox_mAP_copypaste': '-1.000 -1.000 -1.000 -1.000 -1.000 -1.000', 'segm_mAP': -1.0, 'segm_mAP_50': -1.0, 'segm_mAP_75': -1.0, 'segm_mAP_s': -1.0, 'segm_mAP_m': -1.0, 'segm_mAP_l': -1.0, 'segm_mAP_copypaste': '-1.000 -1.000 -1.000 -1.000 -1.000 -1.000'}
Specifically, I changed SyncBN to BN, changed some AutoAugment policies and changed use_checkpoints from False to True to save GPU memory.
There is a warning None of the inputs have requires_grad=True. Gradients will be None while inferencing. Does this have anything to do with my output? How can I fix this?
Thank you!
I trained Swin Transformers Cityscapes Instance Segmentation from pretrained Swin Base on ImageNet 22K. During training, evaluations are very normal. E.g:
However, after testing (test.py) using the
epoch_*.pth
checkpoint file, all the AP and AR are -1. If I export all of the predictions the images are all black. This is the testing output:This is my configs:
Specifically, I changed
SyncBN
toBN
, changed someAutoAugment
policies and changeduse_checkpoints
fromFalse
toTrue
to save GPU memory. There is a warningNone of the inputs have requires_grad=True. Gradients will be None
while inferencing. Does this have anything to do with my output? How can I fix this? Thank you!