IDEA-Research/X-Pose - Githubissues

X-Pose: Detecting Any Keypoints

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-human-pose-estimation-on-human-art)](https://paperswithcode.com/sota/2d-human-pose-estimation-on-human-art?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-animal-kingdom)](https://paperswithcode.com/sota/2d-pose-estimation-on-animal-kingdom?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-300w)](https://paperswithcode.com/sota/2d-pose-estimation-on-300w?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-macaquepose)](https://paperswithcode.com/sota/2d-pose-estimation-on-macaquepose?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-desert-locust)](https://paperswithcode.com/sota/2d-pose-estimation-on-desert-locust?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/2d-pose-estimation-on-vinegar-fly)](https://paperswithcode.com/sota/2d-pose-estimation-on-vinegar-fly?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/multi-person-pose-estimation-on-coco)](https://paperswithcode.com/sota/multi-person-pose-estimation-on-coco?p=unipose-detecting-any-keypoints) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unipose-detecting-any-keypoints/animal-pose-estimation-on-ap-10k)](https://paperswithcode.com/sota/animal-pose-estimation-on-ap-10k?p=unipose-detecting-any-keypoints) ***Online demo***:[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/IDEA-Research/IDEA) ***Quick Checkpoint download***:[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/header/openxlab_models.svg)](https://openxlab.org.cn/models/detail/IDEA-Research/UniPose) #### [Project Page](https://yangjie-cv.github.io/X-Pose/) | [Paper](http://arxiv.org/abs/2310.08530) | [UniKPT Dataset](https://drive.google.com/file/d/1ukLPbTpTfrCQvRY2jY52CgRi-xqvyIsP/view) |[Video](https://github.com/IDEA-Research/UniPose) [Jie Yang^1,2](https://yangjie-cv.github.io/), [Ailing Zeng¹](https://ailingzeng.site/), [Ruimao Zhang²](http://www.zhangruimao.site/), [Lei Zhang¹](https://www.leizhang.org/) ¹[International Digital Economy Academy](https://www.idea.edu.cn/research/cvr.html) ²[The Chinese University of Hong Kong, Shenzhen](https://www.cuhk.edu.cn/en)

🤩 News

2024.07.12: X-Pose supports controllable animal face animation. See details here.
2024.07.02: X-Pose is accepted to ECCV24 (We changed the model name from UniPose to X-Pose to avoid confusion with similarly named previous works).
2024.02.14: We update a file to highlight all classes (1237 classes) in the UNIKPT dataset.
2023.11.28: We are excited to highlight the 68 face keypoints detection ability of X-Pose across any categories in this figure. The definition of face keypoints follows this dataset.
2023.11.9: Thanks to OpenXLab, you can try a quick online demo. Looking forward to the feedback!
2023.11.1: We release the inference code, demo, checkpoints, and the annotation of the UniKPT dataset.
2023.10.13: We release the arxiv version.

In-the-wild Test via X-Pose

X-Pose has strong fine-grained localization and generalization abilities across image styles, categories, and poses.

Detecting any Face Keypoints:

🗒 TODO

[x] Release inference code and demo.
[x] Release checkpoints.
[x] Release UniKPT annotations.
[ ] Release training codes.

💡 Overview

• X-Pose is the first end-to-end prompt-based keypoint detection framework.

• It supports multi-modality prompts, including textual and visual prompts to detect arbitrary keypoints (e.g., from articulated, rigid, and soft objects).

Visual Prompts as Inputs:

Textual Prompts as Inputs:

🔨 Environment Setup

Clone this repo

git clone https://github.com/IDEA-Rensearch/X-Pose.git
cd X-Pose

Install the needed packages
```
pip install -r requirements.txt
```

Compiling CUDA operators

cd models/UniPose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

▶ Demo

1. Guidelines

• We have released the textual prompt-based branch for inference. As the visual prompt involves a substantial amount of user input, we are currently exploring more user-friendly platforms to support this functionality.

• Since X-Pose has learned strong structural prior, it's best to use the predefined skeleton as the keypoint textual prompts, which are shown in predefined_keypoints.py.

• If users don't provide a keypoint prompt, we'll try to match the appropriate skeleton based on the user's instance category. If unsuccessful, we'll default to using the animal's skeleton, which covers a wider range of categories and testing requirements.

2. Run

Replace {GPU ID}, image_you_want_to_test.jpg, and "dir you want to save the output" with appropriate values in the following command

CUDA_VISIBLE_DEVICES={GPU ID} python inference_on_a_image.py \
-c config/UniPose_SwinT.py \
-p weights/unipose_swint.pth \
-i image_you_want_to_test.jpg \
-o "dir you want to save the output" \
-t "instance categories" \ (e.g., "person", "face", "left hand", "horse", "car", "skirt", "table")
-k "keypoint_skeleton_text" (If necessary, please select an option from the 'predefined_keypoints.py' file.)

We also support the inference using gradio.

python app.py

Checkpoints

	name	backbone	Keypoint AP on COCO	Checkpoint	Config
1	X-Pose	Swin-T	74.4	Google Drive / OpenXLab	GitHub Link
2	X-Pose	Swin-L	76.8	Coming Soon	Coming Soon

The UniKPT Dataset

Datasets	KPT	Class	Images	Instances	Unify Images	Unify Instance
COCO	17	1	58,945	156,165	58,945	156,165
300W-Face	68	1	3,837	4,437	3,837	4,437
OneHand10K	21	1	11,703	11,289	2,000	2000
Human-Art	17	1	50,000	123,131	50,000	123,131
AP-10K	17	54	10,015	13,028	10,015	13,028
APT-36K	17	30	36,000	53,006	36,000	53,006
MacaquePose	17	1	13,083	16,393	2,000	2,320
Animal Kingdom	23	850	33,099	33,099	33,099	33,099
AnimalWeb	9	332	22,451	21,921	22,451	21,921
Vinegar Fly	31	1	1,500	1,500	1,500	1,500
Desert Locust	34	1	700	700	700	700
Keypoint-5	55/31	5	8,649	8,649	2,000	2,000
MP-100	561/293	100	16,943	18,000	16,943	18,000
UniKPT	338	1237	-	-	226,547	418,487

• UniKPT is a unified dataset from 13 existing datasets, which is only for non-commercial research purposes.

• All images included in the UniKPT dataset originate from the datasets listed in the table above. To access these images, please download them from the original repository.

• We provide the annotations with precise keypoints' textual descriptions for effective training. More conveniently, you can find the text annotations in the link.

Citing X-Pose

If you find this repository useful for your work, please consider citing it as follows:

@article{xpose,
  title={X-Pose: Detection Any Keypoints},
  author={Yang, Jie and Zeng, Ailing and Zhang, Ruimao and Zhang, Lei},
  journal={ECCV},
  year={2024}
}

@inproceedings{yang2023neural,
  title={Neural Interactive Keypoint Detection},
  author={Yang, Jie and Zeng, Ailing and Li, Feng and Liu, Shilong and Zhang, Ruimao and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={15122--15132},
  year={2023}
}

@inproceedings{yang2022explicit,
  title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
  author={Yang, Jie and Zeng, Ailing and Liu, Shilong and Li, Feng and Zhang, Ruimao and Zhang, Lei},
  booktitle={The Eleventh International Conference on Learning Representations},
  year={2022}
}