Megvii-BaseDetection / DisAlign

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)
Apache License 2.0
118 stars 10 forks source link
Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021) [[Paper]](https://arxiv.org/abs/2103.16370)[[Code]](https://github.com/Megvii-BaseDetection/DisAlign) We implement the classification, object detection and instance segmentation tasks based on our [cvpods](https://github.com/Megvii-BaseDetection/cvpods). The users should **install cvpods first** and run the experiments in this repo. # Changelog - **7.28.2021** Update the **DisAlign** on ImageNet-LT(ResX50) - **4.23.2021** Update the **DisAlign** on LVIS v0.5(Mask R-CNN + Res50) - **4.12.2021** Update the README # 0. How to Use - Step-1: Install the latest [cvpods](https://github.com/Megvii-BaseDetection/cvpods). - Step-2: `cd cvpods` - Step-3: Prepare dataset for different tasks. - Step-4: `git clone https://github.com/Megvii-BaseDetection/DisAlign playground_disalign` - Step-5: Enter one folder and run `pods_train --num-gpus 8` - Step-6: Use `pods_test --num-gpus 8` to evaluate the last the checkpoint # 1. Image Classification We support the the following three datasets: - ImageNet-LT Dataset - iNaturalist-2018 Dataset - Place-LT Dataset We refer the user to [CLS_README](classification/README.md) for more details. # 2. Object Detection/Instance Segmentation We support the two versions of the LVIS dataset: - LVIS v0.5 - LVIS v1.0 **Highlight** 1. To speedup the evaluation on LVIS dataset, we provide the C++ optimized evaluation api by modifying the [coco_eval(C++)](https://github.com/Megvii-BaseDetection/cvpods/blob/master/cvpods/layers/csrc/cocoeval/cocoeval.cpp) in `cvpods`. - The C++ version lvis_eval API will save **~30% time** when calculating the mAP. 2. We provide support for the metric of `AP_fixed` and `AP_pool` proposed in [large-vocab-devil](https://github.com/achalddave/large-vocab-devil) 3. We will support more recent works on long-tail detection in this project(e.g. EQLv2, CenterNet2, etc.) in the future. We refer the user to [DET_README](segmentation/README.md) for more details. # 3. Semantic Segmentation We adopt the mmsegmentation as the codebase for runing all experiments of DisAlign. Currently, the user should use [DisAlign_Seg](TODO) for the semantic segmentation experiments. We will add the support for these experiments in [cvpods](https://github.com/Megvii-BaseDetection/cvpods) in the future. # Acknowledgement Thanks for the following projects: - [cvpods](https://github.com/Megvii-BaseDetection/cvpods) - [Detectron2](https://github.com/facebookresearch/detectron2) - [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) - [classifier-balancing](https://github.com/facebookresearch/classifier-balancing) # Citing DisAlign If you are using the DisAlign in your research or with to refer to the baseline results publised in this repo, please use the following BibTex entry. ```latex @inproceedings{zhang2021disalign, title={Distribution Alignment: A Unified Framework for Long-tail Visual Recognition.}, author={Zhang, Songyang and Li, Zeming and Yan, Shipeng and He, Xuming and Sun, Jian}, booktitle={CVPR}, year={2021} } ``` # License This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.