WongKinYiu / CrossStagePartialNetworks

Cross Stage Partial Networks
https://github.com/WongKinYiu/CrossStagePartialNetworks
894 stars 172 forks source link

Mosaic Augmentation Paper? #8

Open glenn-jocher opened 4 years ago

glenn-jocher commented 4 years ago

@WongKinYiu @AlexeyAB great work on the README, there's a wealth of information here! I saw that the mosaic augmentation outperformed all of the rest in your readme tests, including more well known ones like cutmix.

Based on this I wonder if it's worth publishing a short paper to arxiv, and then we can link to this github, alexeyab/darknet and ultralytics/yolov3 for the mosaic dataloader code?

WongKinYiu commented 4 years ago

@glenn-jocher

For your reference. image

AlexeyAB commented 4 years ago

@glenn-jocher

I saw that the mosaic augmentation outperformed all of the rest in your readme tests, including more well known ones like cutmix.

Yes. It seems: more parts of different images - higher accuracy. So may be we can combine more than 4 images into one: 6, 8, 12 or 16.

Based on this I wonder if it's worth publishing a short paper to arxiv, and then we can link to this github, alexeyab/darknet and ultralytics/yolov3 for the mosaic dataloader code?

Yes. It would also be interesting to know what effect this gives when training the detector.

glenn-jocher commented 4 years ago

@WongKinYiu wow, that's odd that Swish hurt the results while Mish helped a lot. Are you sure your Swish implementation was correct? Mish actually provided the largest improvement in your test, it's too bad it's so slow.

@AlexeyAB yes I was thinking of increasing the 4 images to 9 (in a 3x3 grid), but there are some practical problems with overlap, how to align and build the 3x3 mosaic etc, wheras now the mosaic is very simple, since there is no overlap, and the images all share a common vertex at the center. There would also be diminishing returns for having to load double the images, as currently the grey area is only maybe 10-20% of the image space. train_batch0

AlexeyAB commented 4 years ago

@glenn-jocher

Are you sure your Swish implementation was correct?

Yes, Swish implementation is correct, since Swish improves accuracy for other models: EfficientNet, MixNet, PeleeNet, CSPPeleeNet, ... https://github.com/AlexeyAB/darknet/issues/3994#issuecomment-565692356

but there are some practical problems with overlap, how to align and build the 3x3 mosaic etc

Yes, there are problems for Detector, but there is no problem for Classifier to do 3x3 instead of 2x2: https://github.com/AlexeyAB/darknet/issues/4432

glenn-jocher commented 4 years ago

@AlexeyAB ah yes, that's true. For classification tasks like imagenet the 3x3 mosaic is much simpler :)

WongKinYiu commented 4 years ago

@glenn-jocher @AlexeyAB

https://arxiv.org/pdf/2004.12432.pdf mosaic-like paper.