IDEA-Research / HumanSD

[ICCV 2023] The official implementation of paper "HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation"
Apache License 2.0
271 stars 18 forks source link

Evaluation bugs #3

Closed yjhong89 closed 1 year ago

yjhong89 commented 1 year ago

Hi! Thanks for sharing great work!

While I am following instructions for inference in README, I've encountered some bugs and the result is not good after resolved them. (when running scripts/gradio/pose2img.py)

Bugs

  1. functions from mmpose.apis are changed https://github.com/IDEA-Research/HumanSD/blob/464fcc755b5b5a0cc711b94201f3ac9f68d192e2/scripts/gradio/pose2img.py#L27

    • name of fucntion
    • init_pose_model -> init_model
    • inference_bottom_up_pose_model -> inference_bottomup
    • And arguments of each functions seems changed
  2. use_fp16=True in https://github.com/IDEA-Research/HumanSD/blob/464fcc755b5b5a0cc711b94201f3ac9f68d192e2/configs/humansd/humansd-inference.yaml#L25

    • If use_fp16 is True, error occurs as follow image

Result

def predict(comparison_model, load_image_type, input_image, prompt, added_prompt, ddim_steps, detection_thresh, num_samples, scale, seed, eta, strength, negative_prompt, save_path="logs/gradio_images"): image = np.array(input_image.convert("RGB")) image = HWC3(image) image = resize_image(image, IMAGE_RESOLUTION) humansd_pose_image = image

...


But result is quite bad.
- My result
![2023-05-09-20_24_01](https://github.com/IDEA-Research/HumanSD/assets/26890721/f5582321-536c-4a36-83d4-9f565bef3f5b)
- Result in README
![image](https://github.com/IDEA-Research/HumanSD/assets/26890721/cfaaedf1-844c-4fb0-a8c9-1d08b80cb4f2)

Can you give me any advices ??
Thank you!
juxuan27 commented 1 year ago

Hi, @yjhong89 ! Thank you for your focus and apologize for the late reply. I think you need to check if you've done the following right: (1) correct image channel (PIL.Image and cv2 have different default channel). (2) correct image array type (0-255/0-1). (3) correct input image (make sure draw the image with correct color). However, I still recommend you to try install MMPose. They have provide installation instruction here. Moreover, our provided demo can easily run by simply using pose image (without changing the code). You can try to wipe off code related to MMPose in original inference file instead rewrite the function.

canteen-man commented 1 year ago

Hi. @juxuan27 . Thank you for your excellent project. Can you share your mmpose+mmcv specific version to run this demo. Thank you.

juxuan27 commented 1 year ago

Hi, @canteen-man. The version is shown as follows:

Name: mmpose
Version: 0.29.0
Summary: OpenMMLab Pose Estimation Toolbox and Benchmark.
Home-page: https://github.com/open-mmlab/mmpose
Author: MMPose Contributors
Author-email: openmmlab@gmail.com
License: Apache License 2.0
Location: /home/juxuan/anaconda3/envs/sd-env/lib/python3.9/site-packages

Name: mmcv-full
Version: 1.7.0
Summary: OpenMMLab Computer Vision Foundation
Home-page: https://github.com/open-mmlab/mmcv
Author: MMCV Contributors
Author-email: openmmlab@gmail.com
License: 
Location: /home/juxuan/anaconda3/envs/sd-env/lib/python3.9/site-packages
canteen-man commented 1 year ago

Hello I just config my mmcv and mmpose as same version as you.

Name: mmpose Version: 0.29.0 Summary: OpenMMLab Pose Estimation Toolbox and Benchmark. Home-page: https://github.com/open-mmlab/mmpose Author: MMPose Contributors Author-email: openmmlab@gmail.com

Name: mmcv-full Version: 1.7.0 Summary: OpenMMLab Computer Vision Foundation Home-page: https://github.com/open-mmlab/mmcv Author: MMCV Contributors Author-email: openmmlab@gmail.com

But I meet the error: KeyError: 'BottomupPoseEstimator is not in the models registry'

After

You are running the demo of HumanSD ........ No module 'xformers'. Proceeding without it. LatentPoseText2ImageDiffusion_HumanSD_originalloss: Running in eps-prediction mode DiffusionWrapper has 865.92 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels

Can you help me to figure it out? Thank you.

canteen-man commented 1 year ago

Hello I just config my mmcv and mmpose as same version as you.

Name: mmpose Version: 0.29.0 Summary: OpenMMLab Pose Estimation Toolbox and Benchmark. Home-page: https://github.com/open-mmlab/mmpose Author: MMPose Contributors Author-email: openmmlab@gmail.com

Name: mmcv-full Version: 1.7.0 Summary: OpenMMLab Computer Vision Foundation Home-page: https://github.com/open-mmlab/mmcv Author: MMCV Contributors Author-email: openmmlab@gmail.com

But I meet the error: KeyError: 'BottomupPoseEstimator is not in the models registry'

After

You are running the demo of HumanSD ........ No module 'xformers'. Proceeding without it. LatentPoseText2ImageDiffusion_HumanSD_originalloss: Running in eps-prediction mode DiffusionWrapper has 865.92 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels

Can you help me to figure it out? Thank you.

@juxuan27 Sorry, I think I figure it out. The configs from mmpose also need the correct version. And I found the public link is: https://51efdbe3-b1cb-4474.gradio.live/ It really cool

LiuqingZ2333 commented 1 year ago

Thank you very much for your wonderful work, there is a problem in running the reasoning. The checkpoints of HumanSD you provided is higherhrnet_w48_humanart_512x512_udp.pth, but the code says higherhrnet_w48_coco_512x512_udp.pth. Are these two files the same? If not, can you provide the coco pth file, thank you very much!

juxuan27 commented 1 year ago

Hi, @LiuqingZ2333 ! These two files are not the same. The higherhrnet_w48_humanart_512x512_udp.pth is trained on the joint of HumanArt and MSCOCO, which has a better generation ability. You can find the file if you have apply for the HumanSD's dataset. It is in the checkpoint folder. Of course, you can simply use the chekcpoint trained on coco (higherhrnet_w48_coco_512x512_udp.pth). The usage is the same.