A neat PyTorch implementation of WS-DAN (Weakly Supervised Data Augmentation Network) for FGVC (Fine-Grained Visual Classification). (Hu et al., "See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification", arXiv:1901.09891)
NOTICE: This is NOT an official implementation by authors of WS-DAN. The official implementation is available at tau-yihouxiang/WS_DAN (and there's another unofficial PyTorch version wvinzh/WS_DAN_PyTorch).
Data Augmentation: Attention Cropping and Attention Dropping
Bilinear Attention Pooling (BAP) for Features Generation
Training Process and Testing Process
Dataset | Object | Category | Train | Test | Accuracy (Paper) | Accuracy (PyTorch) | Feature Net |
---|---|---|---|---|---|---|---|
FGVC-Aircraft | Aircraft | 100 | 6,667 | 3,333 | 93.0 | 93.28 | inception_mixed_6e |
CUB-200-2011 | Bird | 200 | 5,994 | 5,794 | 89.4 | 88.28 | inception_mixed_6e |
Stanford Cars | Car | 196 | 8,144 | 8,041 | 94.5 | 94.38 | inception_mixed_6e |
Stanford Dogs | Dog | 120 | 12,000 | 8,580 | 92.2 | 89.66 | inception_mixed_7c |
This repo contains WS-DAN with feature extractors including VGG19('vgg19', 'vgg19_bn'
),
ResNet34/50/101/152('resnet34', 'resnet50', 'resnet101', 'resnet152'
),
and Inception_v3('inception_mixed_6e', 'inception_mixed_7c'
) in PyTorch form, see ./models/wsdan.py
.
net = WSDAN(num_classes=num_classes, M=num_attentions, net='inception_mixed_6e', pretrained=True)
net = WSDAN(num_classes=num_classes, M=num_attentions, net='inception_mixed_7c', pretrained=True)
net = WSDAN(num_classes=num_classes, M=num_attentions, net='vgg19_bn', pretrained=True)
net = WSDAN(num_classes=num_classes, M=num_attentions, net='resnet50', pretrained=True)
FGVC-Aircraft (Aircraft)
-/FGVC-Aircraft/data/
└─── images
└─── 0034309.jpg
└─── 0034958.jpg
└─── ...
└─── variants.txt
└─── images_variant_trainval.txt
└─── images_variant_test.txt
CUB-200-2011 (Bird)
-/CUB-200-2011
└─── images.txt
└─── image_class_labels.txt
└─── train_test_split.txt
└─── images
└─── 001.Black_footed_Albatross
└─── Black_Footed_Albatross_0001_796111.jpg
└─── ...
└─── 002.Laysan_Albatross
└─── ...
Stanford Cars (Car)
-/StanfordCars
└─── cars_test
└─── 00001.jpg
└─── 00002.jpg
└─── ...
└─── cars_train
└─── 00001.jpg
└─── 00002.jpg
└─── ...
└─── devkit
└─── cars_train_annos.mat
└─── cars_test_annos_withlabels.mat
Stanford Dogs (Dog)
-/StanfordDogs
└─── Images
└─── n02085620-Chihuahua
└─── n02085620_10074.jpg
└─── ...
└─── n02085782-Japanese_spaniel
└─── ...
└─── train_list.mat
└─── test_list.mat
git clone
this repo.
Prepare data and modify DATAPATH in datasets/<abcd>_dataset.py
.
Set configurations in config.py
(Training Config, Model Config, Dataset/Path Config):
tag = 'aircraft' # 'aircraft', 'bird', 'car', or 'dog'
$ nohup python3 train.py > progress.bar &
for training.
$ tail -f progress.bar
to see training process (tqdm package is required. Other logs are written in <config.save_dir>/train.log
).
Set configurations in config.py
(Eval Config) and run $ python3 eval.py
for evaluation and visualization.
Code in eval.py
helps generate attention maps. (Image, Heat Attention Map, Image x Attention Map)