Doubiiu / CodeTalker

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
MIT License
515 stars 57 forks source link

where to get pre-trained models model.pth.tar ? #30

Closed MagicRedZero closed 1 year ago

MagicRedZero commented 1 year ago

Make sure the paths of pre-trained models are correct, i.e., vqvae_pretrained_path and wav2vec2model_path in config/<vocaset|BIWI>/stage2.yaml.

cat config/vocaset/stage2.yaml vqvae_pretrained_path: RUN/vocaset/CodeTalker_s1/model/model.pth.tar wav2vec2model_path: facebook/wav2vec2-base-960h

where to get RUN/vocaset/CodeTalker_s1/model/model.pth.tar and facebook/wav2vec2-base-960h ?

Doubiiu commented 1 year ago
  1. You should train the stage1 of CodeTalker and then set the trained model weights path in configs of stage2 accordingly.
  2. It is a pretrained moel weights of wav2vec2 released by facebook. If it cannot be automatically downloaded, you can download these files manually and change the path to your local wav2vec2-base-960h folder.
MagicRedZero commented 1 year ago

can we used vocaset_stage1.pth.tar > RUN/vocaset/CodeTalker_s1/model/model.pth.tar, vocaset_stage2.pth.tar > RUN/vocaset/CodeTalker_s2/model/model.pth.tar to run "sh scripts/test.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2" ?

Doubiiu commented 1 year ago

Sure you can do that.

MagicRedZero commented 1 year ago

[2023-05-06 07:41:54,511 INFO test_pred.py line 23 7740]=>=> creating model ... Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']

I get this result Do we need any other data?

Doubiiu commented 1 year ago

As this method is to animate a 3D neutral face given speech signals, you need to download and preprocess data according to the instruction in Dataset Preparation for VOCASET or BIWI.

MagicRedZero commented 1 year ago

I use vocaset vocaset/ ├── data_verts.npy ├── FLAME_sample.ply ├── init_expression_basis.npy ├── processed_audio_deepspeech.pkl ├── process_voca_data.py ├── raw_audio_fixed.pkl ├── readme.pdf ├── subj_seq_to_idx.pkl ├── templates.pkl ├── vertices_npy │   ├── condition_FaceTalk_170725_00137_TA_subject_FaceTalk_170809_00138_TA.npy │   ├── FaceTalk_170725_00137_TA_sentence01.npy │   ├── FaceTalk_170725_00137_TA_sentence40.npy │   ├── FaceTalk_170728_03272_TA_sentence01.npy │   ├── FaceTalk_170728_03272_TA_sentence40.npy │   ├── FaceTalk_170731_00024_TA_sentence01.npy │   ├── FaceTalk_170731_00024_TA_sentence40.npy │   ├── FaceTalk_170809_00138_TA_sentence01.npy │   ├── FaceTalk_170811_03274_TA_sentence01.npy │   ├── FaceTalk_170811_03275_TA_sentence01.npy │   ├── FaceTalk_170811_03275_TA_sentence40.npy │   ├── FaceTalk_170904_00128_TA_sentence01.npy │   ├── FaceTalk_170904_00128_TA_sentence40.npy │   ├── FaceTalk_170904_03276_TA_sentence01.npy │   ├── FaceTalk_170904_03276_TA_sentence40.npy │   ├── FaceTalk_170908_03277_TA_sentence01.npy │   ├── FaceTalk_170908_03277_TA_sentence40.npy │   ├── FaceTalk_170912_03278_TA_sentence01.npy │   ├── FaceTalk_170912_03278_TA_sentence40.npy │   ├── FaceTalk_170913_03279_TA_sentence01.npy │   ├── FaceTalk_170913_03279_TA_sentence40.npy │   ├── FaceTalk_170915_00223_TA_sentence01.npy │   └── FaceTalk_170915_00223_TA_sentence40.npy ├── vocaset_stage1.pth.tar ├── vocaset_stage2.pth.tar └── wav ├── FaceTalk_170725_00137_TA_sentence01.wav ├── FaceTalk_170725_00137_TA_sentence40.wav ├── FaceTalk_170728_03272_TA_sentence01.wav ├── FaceTalk_170728_03272_TA_sentence40.wav ├── FaceTalk_170731_00024_TA_sentence01.wav ├── FaceTalk_170731_00024_TA_sentence40.wav ├── FaceTalk_170809_00138_TA_sentence01.wav ├── FaceTalk_170809_00138_TA_sentence40.wav ├── FaceTalk_170811_03274_TA_sentence03.wav ├── FaceTalk_170811_03274_TA_sentence40.wav ├── FaceTalk_170811_03275_TA_sentence01.wav ├── FaceTalk_170811_03275_TA_sentence40.wav ├── FaceTalk_170904_00128_TA_sentence01.wav ├── FaceTalk_170904_00128_TA_sentence40.wav ├── FaceTalk_170904_03276_TA_sentence01.wav ├── FaceTalk_170904_03276_TA_sentence40.wav ├── FaceTalk_170908_03277_TA_sentence01.wav ├── FaceTalk_170908_03277_TA_sentence40.wav ├── FaceTalk_170912_03278_TA_sentence01.wav ├── FaceTalk_170913_03279_TA_sentence01.wav ├── FaceTalk_170915_00223_TA_sentence40.wav └── man.wav

MagicRedZero commented 1 year ago

I found the reason, thank you very much

MagicRedZero commented 1 year ago

cd vocaset && python process_voca_data.py

vertices_npy/ wav/

but config/vocaset/stage2.yaml >> wav_path : wav_mini

wav/ != wav_mini so "Loaded data: Train-0, Val-0, Test-0"


sh scripts/render.sh

Failed to open PLY file. --dataset_dir . >> --dataset_dir ./