IDEA-Research / X-Pose

[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"
Other
521 stars 25 forks source link

X-Pose: Detecting Any Keypoints

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-human-pose-estimation-on-human-art)](https://paperswithcode.com/sota/2d-human-pose-estimation-on-human-art?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-animal-kingdom)](https://paperswithcode.com/sota/2d-pose-estimation-on-animal-kingdom?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-300w)](https://paperswithcode.com/sota/2d-pose-estimation-on-300w?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-macaquepose)](https://paperswithcode.com/sota/2d-pose-estimation-on-macaquepose?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-desert-locust)](https://paperswithcode.com/sota/2d-pose-estimation-on-desert-locust?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-vinegar-fly)](https://paperswithcode.com/sota/2d-pose-estimation-on-vinegar-fly?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/multi-person-pose-estimation-on-coco)](https://paperswithcode.com/sota/multi-person-pose-estimation-on-coco?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/animal-pose-estimation-on-ap-10k)](https://paperswithcode.com/sota/animal-pose-estimation-on-ap-10k?p=unipose-detecting-any-keypoints) ***Online demo***:[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/IDEA-Research/IDEA) ***Quick Checkpoint download***:[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/IDEA-Research/UniPose) #### [Project Page](https://yangjie-cv.github.io/X-Pose/) | [Paper](http://arxiv.org/abs/2310.08530) | [UniKPT Dataset](https://drive.google.com/file/d/1ukLPbTpTfrCQvRY2jY52CgRi-xqvyIsP/view) |[Video](https://github.com/IDEA-Research/UniPose) [Jie Yang1,2](https://yangjie-cv.github.io/), [Ailing Zeng1](https://ailingzeng.site/), [Ruimao Zhang2](http://www.zhangruimao.site/), [Lei Zhang1](https://www.leizhang.org/) 1[International Digital Economy Academy](https://www.idea.edu.cn/research/cvr.html) 2[The Chinese University of Hong Kong, Shenzhen](https://www.cuhk.edu.cn/en)

🤩 News

In-the-wild Test via X-Pose

X-Pose has strong fine-grained localization and generalization abilities across image styles, categories, and poses.


Detecting any Face Keypoints:


🗒 TODO

💡 Overview

• X-Pose is the first end-to-end prompt-based keypoint detection framework.


• It supports multi-modality prompts, including textual and visual prompts to detect arbitrary keypoints (e.g., from articulated, rigid, and soft objects).

Visual Prompts as Inputs:


Textual Prompts as Inputs:


🔨 Environment Setup

  1. Clone this repo

    git clone https://github.com/IDEA-Rensearch/X-Pose.git
    cd X-Pose
  2. Install the needed packages

    pip install -r requirements.txt
  3. Compiling CUDA operators

    cd models/UniPose/ops
    python setup.py build install
    # unit test (should see all checking is True)
    python test.py
    cd ../../..

▶ Demo

1. Guidelines

• We have released the textual prompt-based branch for inference. As the visual prompt involves a substantial amount of user input, we are currently exploring more user-friendly platforms to support this functionality.

• Since X-Pose has learned strong structural prior, it's best to use the predefined skeleton as the keypoint textual prompts, which are shown in predefined_keypoints.py.

• If users don't provide a keypoint prompt, we'll try to match the appropriate skeleton based on the user's instance category. If unsuccessful, we'll default to using the animal's skeleton, which covers a wider range of categories and testing requirements.

2. Run

Replace {GPU ID}, image_you_want_to_test.jpg, and "dir you want to save the output" with appropriate values in the following command

CUDA_VISIBLE_DEVICES={GPU ID} python inference_on_a_image.py \
-c config/UniPose_SwinT.py \
-p weights/unipose_swint.pth \
-i image_you_want_to_test.jpg \
-o "dir you want to save the output" \
-t "instance categories" \ (e.g., "person", "face", "left hand", "horse", "car", "skirt", "table")
-k "keypoint_skeleton_text" (If necessary, please select an option from the 'predefined_keypoints.py' file.)

We also support the inference using gradio.

python app.py

Checkpoints

name backbone Keypoint AP on COCO Checkpoint Config
1 X-Pose Swin-T 74.4 Google Drive / OpenXLab GitHub Link
2 X-Pose Swin-L 76.8 Coming Soon Coming Soon

The UniKPT Dataset


Datasets KPT Class Images Instances Unify Images Unify Instance
COCO 17 1 58,945 156,165 58,945 156,165
300W-Face 68 1 3,837 4,437 3,837 4,437
OneHand10K 21 1 11,703 11,289 2,000 2000
Human-Art 17 1 50,000 123,131 50,000 123,131
AP-10K 17 54 10,015 13,028 10,015 13,028
APT-36K 17 30 36,000 53,006 36,000 53,006
MacaquePose 17 1 13,083 16,393 2,000 2,320
Animal Kingdom 23 850 33,099 33,099 33,099 33,099
AnimalWeb 9 332 22,451 21,921 22,451 21,921
Vinegar Fly 31 1 1,500 1,500 1,500 1,500
Desert Locust 34 1 700 700 700 700
Keypoint-5 55/31 5 8,649 8,649 2,000 2,000
MP-100 561/293 100 16,943 18,000 16,943 18,000
UniKPT 338 1237 - - 226,547 418,487

• UniKPT is a unified dataset from 13 existing datasets, which is only for non-commercial research purposes.

• All images included in the UniKPT dataset originate from the datasets listed in the table above. To access these images, please download them from the original repository.

• We provide the annotations with precise keypoints' textual descriptions for effective training. More conveniently, you can find the text annotations in the link.

Citing X-Pose

If you find this repository useful for your work, please consider citing it as follows:

@article{xpose,
  title={X-Pose: Detection Any Keypoints},
  author={Yang, Jie and Zeng, Ailing and Zhang, Ruimao and Zhang, Lei},
  journal={ECCV},
  year={2024}
}
@inproceedings{yang2023neural,
  title={Neural Interactive Keypoint Detection},
  author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={15122--15132},
  year={2023}
}
@inproceedings{yang2022explicit,
  title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
  author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
  booktitle={The Eleventh International Conference on Learning Representations},
  year={2022}
}