TL;DR

In a long-tail dataset, where large difference in the number of data per class exists, they proposed an online data augmentation method to mix rare categories and frequent categories in feature space. They succeeded in greatly improving the accuracy of the method compared to existing methods.

Why it matters:

Paper URL

https://arxiv.org/abs/2008.03673

Submission Dates(yyyy/mm/dd)

2020/08/09

Authors and institutions

Peng Chu, Xiao Bian, Shaopeng Liu, Haibin Ling

Temple University
Google
GE Research
Stony Brook University

Methods

They use a two-stage approach. In the first stage, the model is trained as a normal classification problem, and then create augmented rare categori data and fine-tune using the features after Global Average Pooling. In the second stage, they interpret the features of frequent data that are easily confused with rare data as "general features not related to class classification" that do not form the basis for decisions when viewed in CAM, and mix the rare data with "class-specific features" that form the basis for decisions when viewed in CAM to create new rare data sample. By doing this online, we can greatly increase accuracy.

Results

We can see that the accuracy is greatly improved when they do Fine-tune. In addition, the accuracy of the long-tail dataset is improved rather significantly compared to that of Focal Loss etc.

Comments

ECCV2020

AkiraTOSEI / ML_papers

Feature Space Augmentation for Long-Tailed Data #43