This codebase replicates results for pedestrian detection with domain shifts on the BDD100k dataset, following the CVPR 2019 paper Automatic adaptation of object detectors to new domains using self-training. We provide trained models, train and eval scripts as well as splits of the dataset for download. More details are available on the project page.
This repository is heavily based off A Pytorch Implementation of Detectron. We modify it for experiments on domain adaptation of face and pedestrian detectors.
If you find this codebase useful, please consider citing:
@inproceedings{roychowdhury2019selftrain,
Author = {Aruni RoyChowdhury and Prithvijit Chakrabarty and Ashish Singh and SouYoung Jin and Huaizu Jiang and Liangliang Cao and Erik Learned-Miller},
Title = {Automatic adaptation of object detectors to new domains using self-training},
Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
Year = {2019}
}
Clone the repo:
git clone git@github.com:AruniRC/detectron-self-train.git
Tested under python3.
This walkthrough describes setting up this Detectron repo. The detailed instructions are in INSTALL.md.
Create a data folder under the repo,
cd {repo_root}
mkdir data
Our pedestrian detection task uses both labeled and unlabeled data from the Berkeley Deep Drive BDD-100k dataset. Please register and download the dataset from their website. We use a symlink from our project root, data/bdd100k
to link to the location of the downloaded dataset. The folder structure should be like this:
data/bdd100k/
images/
test/
train/
val/
labels/
train/
val/
BDD-100k takes about 6.5 GB disk space. The 100k unlabeled videos take 234 GB space, but you do not need to download them, since we have already done the hard example mining on these and the extracted frames (+ pseudo-labels) are available for download.
Mining the hard positives ("HPs") involve detecting pedestrians and tracklet formation on 100K videos. This was done on the UMass GPU Cluster and took about a week. We do not include this pipeline here (yet) -- the mined video frames and annotations are available for download as a gzipped tarball from here. NOTE: this is a large download (23 GB). The data retains the permissions and licensing associated with the BDD-100K dataset (we make the video frames available here for ease of research).
Now we create a symlink to the untarred BDD HPs from the project data folder, which should have the following structure: data/bdd_peds_HP18k/*.jpg
. The image naming convention is <video-name>_<frame-number>.jpg
.
All the annotations are assumed to be downloaded inside a folder data/bdd_jsons
relative to the project root: data/bdd_jsons/*.json
. We use symlinks here as well, in case the JSONs are kept in some other location.
Data Split | JSON | Dataset name | Image Dir. |
---|---|---|---|
BDD_Source_Train | bdd_peds_train.json | bdd_peds_train | data/bdd100k |
BDD_Source_Val | bdd_peds_val.json | bdd_peds_val | data/bdd100k |
BDD_Target_Train | bdd_peds_not_clear_any_daytime_train.json | bdd_peds_not_clear_any_daytime_train | data/bdd100k |
BDD_Target_Val | bdd_peds_not_clear_any_daytime_val.json | bdd_peds_not_clear_any_daytime_val | data/bdd100k |
BDD_dets | bdd_dets18k.json | DETS18k | data/bdd_peds_HP18k |
BDD_HP | bdd_HP18k.json | HP18k | data/bdd_peds_HP18k |
BDD_score_remap | bdd_HP18k_remap_hist.json | HP18k_remap_hist | data/bdd_peds_HP18k |
BDD_target_GT | bdd_target_labeled.json | bdd_peds_not_clear_any_daytime_train_100 | data/bdd100k |
Use the environment variable CUDA_VISIBLE_DEVICES
to control which GPUs to use. All the training scripts are run with 4 GPUs. The trained model checkpoints can be downloaded from the links under the column Model weights. The eval scripts need to be modified to point to where the corresponding model checkpoints have been downloaded locally. To be consistent, we suggest creating a folder under the project root like data/bdd_pre_trained_models
and saving all the models under it.
The performance numbers shown are from single models (the same models available for download), while the tables in the paper show results averaged across 5 rounds of train/test.
Method | Model weights | Config YAML | Train script | Eval script | AP, AR |
---|---|---|---|---|---|
Baseline | bdd_baseline | cfg | train | eval | 15.21, 33.09 |
Dets | bdd_dets | cfg | train | eval | 27.55, 56.90 |
HP | bdd_hp | cfg | train | eval | 28.34, 58.04 |
HP-constrained | bdd_hp-cons | cfg | train | eval | 29.57, 56.48 |
HP-score-remap | bdd_score-remap | cfg | train | eval | 28.11, 56.80 |
DA-im | bdd_da-im | cfg | train | eval | 25.71, 56.29 |
Src-Target-GT | bdd_target-gt | cfg | train | eval | 35.40, 66.26 |
HP-constrained | Baseline |
---|---|
The folder gypsum/scripts/demo
contains two shell scripts that run the pre-trained Baseline (BDD-Source trained) and HP-constrained (domain adapted to BDD Target) models on a sample image. Please change the MODEL_PATH
variable in these scripts to where the appropriate models have been downloaded locally. Your results should resemble the example shown above. Note that the domain adapted model (HP-constrained) detects pedestrians with higher confidence (visualization threshold is 0.9 on the confidence score), while making one false positive in the background.
This material is based on research sponsored by the AFRL and DARPA under agreement num-ber FA8750-18-2-0126. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the AFRL and DARPA or the U.S. Government. We acknowledge support from the MassTech Collaborative grant for funding the UMass GPU cluster. We thank Tsung-Yu Lin and Subhransu Maji for helpful discussions.
We appreciate the well-organized and accurate codebase for the Detectron implementation in PyTorch from the creators of A Pytorch Implementation of Detectron. Also thanks to the creators of BDD-100k which has allowed us to share our pseudo-labeled video frames for our academic, non-commercial purpose of quickly reproducing results.