aim-uofa / GenPercept

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
https://huggingface.co/spaces/guangkaixu/GenPercept
Creative Commons Zero v1.0 Universal
91 stars 2 forks source link
depth-estimation dichotomous-image-segmentation human-pose-estimation image-matting monocular-depth-estimation semantic-segmentation surface-normals

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

[Guangkai Xu](https://github.com/guangkaixu/),   [Yongtao Ge](https://yongtaoge.github.io/),   [Mingyu Liu](https://mingyulau.github.io/),   [Chengxiang Fan](https://leaf1170124460.github.io/),   [Kangyang Xie](https://github.com/felix-ky),   [Zhiyue Zhao](https://github.com/ZhiyueZhau),   [Hao Chen](https://stan-haochen.github.io/),   [Chunhua Shen](https://cshen.github.io/),   Zhejiang University ### [HuggingFace (Space)](https://huggingface.co/spaces/guangkaixu/GenPercept) | [HuggingFace (Model)](https://huggingface.co/guangkaixu/GenPercept) | [arXiv](https://arxiv.org/abs/2403.06090) #### 🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️
image

📢 News

🖥️ Dependencies

conda create -n genpercept python=3.10
conda activate genpercept
pip install -r requirements.txt
pip install -e .

🚀 Inference

Download the pre-trained models genpercept_ckpt_v1.zip from BaiduNetDisk (Extract code: g2cm), HuggingFace, or [Rec Cloud Disk (To be uploaded)](). Please unzip the package and put the checkpoints under ./weights/v1/.

Then, place images in the ./input/$TASK_TYPE dictionary, and run the following script. The output depth will be saved in ./output/$TASK_TYPE. The $TASK_TYPE can be chosen from depth, normal, and dis.

sh scripts/inference_depth.sh

For surface normal estimation and dichotomous image segmentation , run the following script:

bash scripts/inference_normal.sh
bash scripts/inference_dis.sh

Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)

📖 Recommanded Works

🏅 Results in Paper

Depth and Surface Normal

image

Dichotomous Image Segmentation

image

Image Matting

image

Human Pose Estimation

image

🎫 License

For non-commercial use, this code is released under the LICENSE. For commercial use, please contact Chunhua Shen.

🎓 Citation

@article{xu2024diffusion,
  title={Diffusion Models Trained with Large Data Are Transferable Visual Models},
  author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua},
  journal={arXiv preprint arXiv:2403.06090},
  year={2024}
}