chengchunhsu / EveryPixelMatters

Implementation of ECCV 2020 paper "Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector"
Other
165 stars 21 forks source link
adversarial-learning anchor-free computer-vision domain-adaptation eccv eccv-2020 eccv2020 fcos object-detection pytorch transfer-learning unsupervised-domain-adaptation

Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector

[Project Page] [PDF]

This project hosts the code for the implementation of Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector (ECCV 2020).

Introduction

A domain adaptive object detector aims to adapt itself to unseen domains that may contain variations of object appearance, viewpoints or backgrounds. Most existing methods adopt feature alignment either on the image level or instance level. However, image-level alignment on global features may tangle foreground/background pixels at the same time, while instance-level alignment using proposals may suffer from the background noise.

Different from existing solutions, we propose a domain adaptation framework that accounts for each pixel via predicting pixel-wise objectness and centerness. Specifically, the proposed method carries out center-aware alignment by paying more attention to foreground pixels, hence achieving better adaptation across domains. To better align features across domains, we develop a center-aware alignment method that allows the alignment process.

We demonstrate our method on numerous adaptation settings with extensive experimental results and show favorable performance against existing state-of-the-art algorithms.

Installation

Check INSTALL.md for installation instructions.

The implementation of our anchor-free detector is heavily based on FCOS (#f0a9731).

Dataset

All details of dataset construction can be found in Sec 4.2 of our paper.

We construct the training and testing set by three following settings:

After the preparation, the dataset should be stored as follows:

[DATASET_PATH]
└─ Cityscapes
   └─ cocoAnnotations
   └─ leftImg8bit
      └─ train
      └─ val
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ KITTI
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages
└─ Sim10k
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages

Format and Path

Before training, please checked paths_catalog.py and enter the correct data path for:

For example, if the datasets have been stored as the way we mentioned, the paths should be set as follows:

(Optional) Format Conversion

If you want to construct the dataset and convert data format manually, here are some useful links:

Training

To reproduce our experimental result, we recommend training the model by following steps.

Let's take Cityscapes -> Foggy Cityscapes as an example.

1. Pre-training with only GA module

Run the bash files directly:

or type the bash commands:

2. Training with both GA and CA module

First, set the MODEL.WEIGHT as the path of pre-trained weight in L5 of the config file (example).

Next, the model can be trained by the following commands:

Run the bash files directly:

or type the bash commands:

Note that the optimizer and scheduler will not be loaded from the pre-trained weight in the default setting. You can set load_opt_sch as True in train_net_da.py to change the setting.

Evaluation

The trained model can be evaluated by the following command.

python tools/test_net.py \
    --config-file [CONFIG_PATH] \
    MODEL.WEIGHT [WEIGHT_PATH] \
    TEST.IMS_PER_BATCH 4

For example, the following command evaluates the model weight vgg_cs.pth for Cityscapes -> Foggy Cityscapes using VGG-16 backbone.

python tools/test_net.py \
    --config-file configs/da_ga_ca_cityscapes_VGG_16_FPN_4x.yaml \
    MODEL.WEIGHT "vgg_cs.pth" \
    TEST.IMS_PER_BATCH 4

Note that the commands for evaluation are completely derived from FCOS (#f0a9731).

Please see here for more details.

Result

We provide the experimental results and model weights in this section.

Dataset Backbone mAP mAP@0.50 mAP@0.75 mAP@S mAP@M mAP@L Model Result
Cityscapes -> Foggy Cityscapes VGG-16 19.6 36.0 18.1 2.8 17.9 38.1 link link
Sim10k -> Cityscapes VGG-16 25.2 49.0 24.8 6.0 27.8 51.0 link link
KITTI -> Cityscapes VGG-16 18.2 44.3 10.8 6.2 22.0 37.1 link link

*Since the original model weight for KITTI dataset is inaccessible for now, we re-run the experiment and provide a similar (and even better) result in the table.

Note that we use 4 GPUs for faster training. For fair comparison, we also report the results using only one GPU.

Dataset Backbone mAP mAP@0.50 mAP@0.75 mAP@S mAP@M mAP@L Model Result
Sim10k -> Cityscapes VGG-16 28.2 49.7 27.8 6.3 30.6 57.0 link link

Environments

Citations

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{hsu2020epm,
  title     = {Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector},
  author    = {Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang},
  booktitle = {European Conference on Computer Vision},
  year      = {2020}
}