I've reproduced your code but the results are inconsistent with the effects of the weights you've open-sourced. I'm wondering if your model_state-10000.th was only trained on the TikTok dataset? Also, could I possibly get the optimizer_state_latest.th that corresponds to your first-stage weight model_state-10000.th?Thank you in advance.

Boese0601 / MagicDance

[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

https://boese0601.github.io/magicdance/

Other

629 stars 52 forks source link

I've reproduced your code but the results are inconsistent with the effects of the weights you've open-sourced. I'm wondering if your model_state-10000.th was only trained on the TikTok dataset? Also, could I possibly get the optimizer_state_latest.th that corresponds to your first-stage weight model_state-10000.th?Thank you in advance. #18

Closed Jeff-Fudan closed 4 months ago

Boese0601 commented 4 months ago

Yes, I’ve only trained the model on TikTok dataset. No other data. Can you first set strict=True in your load_stat_dict function and check if the pretrained weights from stage 1 match the model architecture?

For optimizer_state_latest.th I don’t have it cuz I lose access to the Bytedance Server after I left. But let me see if I can do anything.

Jeff-Fudan commented 4 months ago

Yes, I’ve only trained the model on TikTok dataset. No other data. Can you first set strict=True in your load_stat_dict function and check if the pretrained weights from stage 1 match the model architecture?

For optimizer_state_latest.th I don’t have it cuz I lose access to the Bytedance Server after I left. But let me see if I can do anything.

Hello, what I meant was that the results from the first stage are inconsistent. I followed your code exactly, with a batch size of 64, but the first stage of training results in many artifacts. However, the effects are quite good when using the model_state-10000.th weights you provided.

Boese0601 commented 4 months ago

Does the training log output image contains artifacts as well? I've never come across such an issue running from my end. Are you sure the env is correct and while setting strict=True in load_stat_dict, there's no error for loading weights?

Jeff-Fudan commented 4 months ago

Does the training log output image contains artifacts as well? I've never come across such an issue running from my end. Are you sure the env is correct and while setting strict=True in load_stat_dict, there's no error for loading weights?

I've set up my environment according to your 'env' configuration. The only difference is that I'm using PyTorch-2.1. I haven't made any changes to your code, and I'm using the TikTok data you provided. However, the results I'm getting are significantly different from those using your weights. May I kindly ask if these weights were produced using this exact set of code? I appreciate your patience and guidance.

Boese0601 commented 4 months ago

I'm 100% absolutely sure the code is correct since I can reproduce from my end. Please try again with my env provided. And confirm again while setting strict=True in load_stat_dict, is there no error for loading weights? Because if strict=False it won't give u error even if the weights is wrong and it will initialize with default weight of Stable diffusion. Please make sure there's no error while strict=True.

Jeff-Fudan commented 4 months ago

I'm 100% absolutely sure the code is correct since I can reproduce from my end. Please try again with my env provided. And confirm again while setting strict=True in load_stat_dict, is there no error for loading weights? Because if strict=False it won't give u error even if the weights is wrong and it will initialize with default weight of Stable diffusion. Please make sure there's no error while strict=True.

Hello, thank you for your affirmative response. What I meant was that the results from the first stage don't match up. The 'setting strict=True in load_stat_dict' you mentioned should be something to pay attention to in the second stage, right? In the first stage, we're just loading the weights of the controlNet, aren't we? Would the version of Torch really have such a significant impact on the results? It seems like this is the only difference left.

Boese0601 commented 4 months ago

No, this argument is also used in loading weights during inference. Just set it to True and see if there’s error. We aren’t using ControlNet for appearance control pretraning it’s weight copy of entire SD-UNet. ControlNet is for stage 2 only.

Jeff-Fudan commented 4 months ago

No, this argument is also used in loading weights during inference. Just set it to True and see if there’s error. We aren’t using ControlNet for appearance control pretraning it’s weight copy of entire SD-UNet. ControlNet is for stage 2 only.

Alright, I'm very interested in your work! It's a fantastic project. However, the weights I trained should be the same as yours, using the same inference script and loading method. But the weights I trained have many artifacts, while the weights you provided are excellent. Could we possibly connect on WeChat? I would be very grateful! My WeChat ID is: iHenrenzhenderen.

Boese0601 commented 4 months ago

Issue solved :)