Zejun-Yang / AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Apache License 2.0
4.63k stars 575 forks source link

size mismatch for PPE.pe #197

Open godsonzhou opened 1 month ago

godsonzhou commented 1 month ago

when I prepare the environment by using pip24.2 all package successfully installed except torchsde:

pip install torchsde==0.2.5 Looking in indexes: https://mirrors.aliyun.com/pypi/simple/ Collecting torchsde==0.2.5 Using cached https://mirrors.aliyun.com/pypi/packages/73/8d/efd3e7b31ea854d0bd6886aa3cf44914adce113a6d460850af41ac1dd4dd/torchsde-0.2.5-py3-none-any.whl (59 kB) WARNING: Ignoring version 0.2.5 of torchsde since it has invalid metadata: Requested torchsde==0.2.5 from https://mirrors.aliyun.com/pypi/packages/73/8d/efd3e7b31ea854d0bd6886aa3cf44914adce113a6d460850af41ac1dd4dd/torchsde-0.2.5-py3-none-any.whl#sha256=4c34373a94a357bdf60bbfee00c850f3563d634491555820b900c9a4f7eff300 has invalid metadata: . suffix can only be used with == or != operators numpy (>=1.19.) ; python_version >= "3.7"


Please use pip<24.1 if you need to use this version.
ERROR: Could not find a version that satisfies the requirement torchsde==0.2.5 (from versions: 0.2.5, 0.2.6)
ERROR: No matching distribution found for torchsde==0.2.5

so I remove the version specification of torchsde in requirement.txt, and pip install -r requirement successfully completed.

when I run 'python -m scripts.app', the following error occurs:

(anip) D:\gitrepos\AniPortrait>python -m scripts.app
Some weights of the model checkpoint at ./pretrained_model/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ./pretrained_model/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of the model checkpoint at ./pretrained_model/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.weight', 'lm_head.bias']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at ./pretrained_model/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
  File "D:\pinokio\bin\miniconda\envs\anip\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\pinokio\bin\miniconda\envs\anip\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\gitrepos\AniPortrait\scripts\app.py", line 49, in <module>
    a2p_model.load_state_dict(torch.load(audio_infer_config['pretrained_model']['a2p_ckpt']), strict=False)
  File "D:\pinokio\bin\miniconda\envs\anip\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Audio2PoseModel:
        size mismatch for PPE.pe: copying a param with shape torch.Size([1, 630, 512]) from checkpoint, the shape in current model is torch.Size([1, 600, 512]).

is this a torchsde version problem?
WarmCongee commented 2 weeks ago

I encountered the same problem, how did you solve it?

xwb123 commented 1 week ago

+1

WarmCongee commented 1 week ago

+1

I fixed this problem by changing the initialization length of pose embedding in the code. The specific location is on line 43 of the file https://github.com/Zejun-Yang/AniPortrait/blob/main/src/audio_models/pose_model.py. I fixed this by changing max_len to 630. But I'm not sure this is as required