where to get pre-trained models model.pth.tar ?

MagicRedZero commented 1 year ago

Make sure the paths of pre-trained models are correct, i.e., vqvae_pretrained_path and wav2vec2model_path in config/<vocaset|BIWI>/stage2.yaml.

cat config/vocaset/stage2.yaml vqvae_pretrained_path: RUN/vocaset/CodeTalker_s1/model/model.pth.tar wav2vec2model_path: facebook/wav2vec2-base-960h

where to get RUN/vocaset/CodeTalker_s1/model/model.pth.tar and facebook/wav2vec2-base-960h ?

Doubiiu commented 1 year ago

You should train the stage1 of CodeTalker and then set the trained model weights path in configs of stage2 accordingly.
It is a pretrained moel weights of wav2vec2 released by facebook. If it cannot be automatically downloaded, you can download these files manually and change the path to your local wav2vec2-base-960h folder.

MagicRedZero commented 1 year ago

can we used vocaset_stage1.pth.tar > RUN/vocaset/CodeTalker_s1/model/model.pth.tar, vocaset_stage2.pth.tar > RUN/vocaset/CodeTalker_s2/model/model.pth.tar to run "sh scripts/test.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2" ?

Doubiiu commented 1 year ago

Sure you can do that.

MagicRedZero commented 1 year ago

[2023-05-06 07:41:54,511 INFO test_pred.py line 23 7740]=>=> creating model ... Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']

This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.masked_spec_embed'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [2023-05-06 07:42:01,650 INFO test_pred.py line 28 7740]=>=> loading checkpoint 'RUN/vocaset/CodeTalker_s2/model/model.pth.tar' [2023-05-06 07:42:02,127 INFO test_pred.py line 31 7740]=>=> loaded checkpoint 'RUN/vocaset/CodeTalker_s2/model/model.pth.tar' Loading data... Loaded data: Train-0, Val-0, Test-0 Traceback (most recent call last): File "main/test_pred.py", line 75, in main() File "main/test_pred.py", line 37, in main dataset = get_dataloaders(cfg) File "/home/nvme1n1p1/CodeTalker/dataset/data_loader.py", line 107, in get_dataloaders dataset["train"] = data.DataLoader(dataset=train_data, batch_size=args.batch_size, shuffle=True, num_workers=args.workers) File "/root/anaconda3/envs/codetalker/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 344, in init sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] File "/root/anaconda3/envs/codetalker/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 108, in init "value, but got num_samples={}".format(self.num_samples)) ValueError: num_samples should be a positive integer value, but got num_samples=0

I get this result Do we need any other data?

Doubiiu commented 1 year ago

As this method is to animate a 3D neutral face given speech signals, you need to download and preprocess data according to the instruction in Dataset Preparation for VOCASET or BIWI.

MagicRedZero commented 1 year ago

I use vocaset vocaset/ ├── data_verts.npy ├── FLAME_sample.ply ├── init_expression_basis.npy ├── processed_audio_deepspeech.pkl ├── process_voca_data.py ├── raw_audio_fixed.pkl ├── readme.pdf ├── subj_seq_to_idx.pkl ├── templates.pkl ├── vertices_npy │ ├── condition_FaceTalk_170725_00137_TA_subject_FaceTalk_170809_00138_TA.npy │ ├── FaceTalk_170725_00137_TA_sentence01.npy │ ├── FaceTalk_170725_00137_TA_sentence40.npy │ ├── FaceTalk_170728_03272_TA_sentence01.npy │ ├── FaceTalk_170728_03272_TA_sentence40.npy │ ├── FaceTalk_170731_00024_TA_sentence01.npy │ ├── FaceTalk_170731_00024_TA_sentence40.npy │ ├── FaceTalk_170809_00138_TA_sentence01.npy │ ├── FaceTalk_170811_03274_TA_sentence01.npy │ ├── FaceTalk_170811_03275_TA_sentence01.npy │ ├── FaceTalk_170811_03275_TA_sentence40.npy │ ├── FaceTalk_170904_00128_TA_sentence01.npy │ ├── FaceTalk_170904_00128_TA_sentence40.npy │ ├── FaceTalk_170904_03276_TA_sentence01.npy │ ├── FaceTalk_170904_03276_TA_sentence40.npy │ ├── FaceTalk_170908_03277_TA_sentence01.npy │ ├── FaceTalk_170908_03277_TA_sentence40.npy │ ├── FaceTalk_170912_03278_TA_sentence01.npy │ ├── FaceTalk_170912_03278_TA_sentence40.npy │ ├── FaceTalk_170913_03279_TA_sentence01.npy │ ├── FaceTalk_170913_03279_TA_sentence40.npy │ ├── FaceTalk_170915_00223_TA_sentence01.npy │ └── FaceTalk_170915_00223_TA_sentence40.npy ├── vocaset_stage1.pth.tar ├── vocaset_stage2.pth.tar └── wav ├── FaceTalk_170725_00137_TA_sentence01.wav ├── FaceTalk_170725_00137_TA_sentence40.wav ├── FaceTalk_170728_03272_TA_sentence01.wav ├── FaceTalk_170728_03272_TA_sentence40.wav ├── FaceTalk_170731_00024_TA_sentence01.wav ├── FaceTalk_170731_00024_TA_sentence40.wav ├── FaceTalk_170809_00138_TA_sentence01.wav ├── FaceTalk_170809_00138_TA_sentence40.wav ├── FaceTalk_170811_03274_TA_sentence03.wav ├── FaceTalk_170811_03274_TA_sentence40.wav ├── FaceTalk_170811_03275_TA_sentence01.wav ├── FaceTalk_170811_03275_TA_sentence40.wav ├── FaceTalk_170904_00128_TA_sentence01.wav ├── FaceTalk_170904_00128_TA_sentence40.wav ├── FaceTalk_170904_03276_TA_sentence01.wav ├── FaceTalk_170904_03276_TA_sentence40.wav ├── FaceTalk_170908_03277_TA_sentence01.wav ├── FaceTalk_170908_03277_TA_sentence40.wav ├── FaceTalk_170912_03278_TA_sentence01.wav ├── FaceTalk_170913_03279_TA_sentence01.wav ├── FaceTalk_170915_00223_TA_sentence40.wav └── man.wav

MagicRedZero commented 1 year ago

I found the reason, thank you very much

MagicRedZero commented 1 year ago

cd vocaset && python process_voca_data.py

vertices_npy/ wav/

but config/vocaset/stage2.yaml >> wav_path : wav_mini

wav/ != wav_mini so "Loaded data: Train-0, Val-0, Test-0"

sh scripts/render.sh

Failed to open PLY file. --dataset_dir . >> --dataset_dir ./

Doubiiu / CodeTalker

where to get pre-trained models model.pth.tar ? #30