Closed FateScript closed 2 years ago
By the way, I understand that some settings are legacy of SSD, but since you are using RetinaNet in your paper, I really cound not get the point.
Sorry for the confusion. The AP-loss suffers much from the computational complexity. So we use only 2 anchor scales (should not cause a big impact on the final performance as reported in the Focal loss paper) and smaller and fixed input size (512*512) for training images. These settings can significantly reduce the memory cost so we can use larger batch-size, and also accelerate the training speed. We also use the SSD-like augmentation strategy (for better performance). So, the training epochs should be much larger than 12, to fully train the model.
Thanks for your reply! I tried a standard 1x RetinaNet setting on COCO with your AP-loss, and it doesn't improve the performance, is this caused by shorter training time?( I mean, maybe AP-loss might bring the upper bound, but not in 1x training time.)
I am not very sure. I think there are some possible reasons: 1. Batch-size. What is the batch-size on one GPU? This matters since the ranking is computed separately in each GPU device. 2. Learning rate. Since the training time is shorter, did you use larger learning rate (e.g. 0.01)? 3. Training time. As you said, maybe 1x training time is not enough for AP-loss to fully train the model, how is the performance of AP-loss on training set (compared to Focal loss)?
I am not very sure. I think there are some possible reasons: 1. Batch-size. What is the batch-size on one GPU? This matters since the ranking is computed separately in each GPU device. 2. Learning rate. Since the training time is shorter, did you use larger learning rate (e.g. 0.01)? 3. Training time. As you said, maybe 1x training time is not enough for AP-loss to fully train the model, how is the performance of AP-loss on training set (compared to Focal loss)?
With my setting, I got a RetinaNet with 34.6 mmAP, while mmAP of standard RetinaNet is 35.9
Maybe the difference is from using smaller batch-size. We have 8 images on one GPU, so the ranking is more stable than that of batch-size 2. Since each combination of images could be a new sample in the sense of ranking, and a sample containing less images is more likely to be biased.
Hi, I am reading your repo and some of your code really confused me.
Anchor scale https://github.com/cccorn/AP-loss/blob/79ba97c0eba6f8654d13eabe208d7644f0da3313/lib/config.py#L16 is [ 2 0,2 (1.0/2.0)], but not [ 2 0, 2 (1.0 / 3), 2 ** (2.0 / 3)]
Epoch here https://github.com/cccorn/AP-loss/blob/79ba97c0eba6f8654d13eabe208d7644f0da3313/lib/config.py#L21 is 100,but standard coco dataset only trains about 12 epoch
Input size here https://github.com/cccorn/AP-loss/blob/79ba97c0eba6f8654d13eabe208d7644f0da3313/lib/config.py#L12 is 512, not [800, 1333]
could you explain it kindly ?