some questions about AP-loss config

cccorn / AP-loss

The implementation of "Towards accurate one-stage object detection with AP-loss".

MIT License

175 stars 29 forks source link

some questions about AP-loss config #8

Closed FateScript closed 2 years ago

FateScript commented 4 years ago

Hi, I am reading your repo and some of your code really confused me.

Anchor scale https://github.com/cccorn/AP-loss/blob/79ba97c0eba6f8654d13eabe208d7644f0da3313/lib/config.py#L16 is [ 2 0,2 (1.0/2.0)], but not [ 2 0, 2 (1.0 / 3), 2 ** (2.0 / 3)]
Epoch here https://github.com/cccorn/AP-loss/blob/79ba97c0eba6f8654d13eabe208d7644f0da3313/lib/config.py#L21 is 100，but standard coco dataset only trains about 12 epoch
Input size here https://github.com/cccorn/AP-loss/blob/79ba97c0eba6f8654d13eabe208d7644f0da3313/lib/config.py#L12 is 512, not [800, 1333]

could you explain it kindly ?

FateScript commented 4 years ago

By the way, I understand that some settings are legacy of SSD, but since you are using RetinaNet in your paper, I really cound not get the point.

cccorn commented 4 years ago

Sorry for the confusion. The AP-loss suffers much from the computational complexity. So we use only 2 anchor scales (should not cause a big impact on the final performance as reported in the Focal loss paper) and smaller and fixed input size (512*512) for training images. These settings can significantly reduce the memory cost so we can use larger batch-size, and also accelerate the training speed. We also use the SSD-like augmentation strategy (for better performance). So, the training epochs should be much larger than 12, to fully train the model.

FateScript commented 4 years ago

Thanks for your reply! I tried a standard 1x RetinaNet setting on COCO with your AP-loss, and it doesn't improve the performance, is this caused by shorter training time?( I mean, maybe AP-loss might bring the upper bound, but not in 1x training time.)

cccorn commented 4 years ago

I am not very sure. I think there are some possible reasons: 1. Batch-size. What is the batch-size on one GPU? This matters since the ranking is computed separately in each GPU device. 2. Learning rate. Since the training time is shorter, did you use larger learning rate (e.g. 0.01)? 3. Training time. As you said, maybe 1x training time is not enough for AP-loss to fully train the model, how is the performance of AP-loss on training set (compared to Focal loss)?

FateScript commented 4 years ago

I am not very sure. I think there are some possible reasons: 1. Batch-size. What is the batch-size on one GPU? This matters since the ranking is computed separately in each GPU device. 2. Learning rate. Since the training time is shorter, did you use larger learning rate (e.g. 0.01)? 3. Training time. As you said, maybe 1x training time is not enough for AP-loss to fully train the model, how is the performance of AP-loss on training set (compared to Focal loss)?

My batchsize is 16(8GPU x 2 Images per gpu, cause I use image size of 800x1333) 2.I use a larger learning rate which is 0.01

With my setting, I got a RetinaNet with 34.6 mmAP, while mmAP of standard RetinaNet is 35.9

cccorn commented 4 years ago

Maybe the difference is from using smaller batch-size. We have 8 images on one GPU, so the ranking is more stable than that of batch-size 2. Since each combination of images could be a new sample in the sense of ranking, and a sample containing less images is more likely to be biased.