UttaranB127 / speech2affective_gestures

This is the official implementation of the paper "Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning".
https://gamma.umd.edu/s2ag/
MIT License
44 stars 9 forks source link

Errors and missing files #16

Closed Ibrahimatef closed 2 years ago

Ibrahimatef commented 2 years ago

Hi, Thanks for your great work! I tried to test pretrained model on ted-db using config/multimodal_context_v2.yml file and set train-s2ag argument to False, but I faced this error :

File "main_v2.py", line 120, in <module>
    train_data_ted, val_data_ted, test_data_ted = loader.load_ted_db_data(data_path, s2ag_config_args, args.train_s2ag)
  File "/content/speech2affective_gestures/loader_v2.py", line 589, in load_ted_db_data
    train_dataset = TedDBParamsMinimal(config_args.train_data_path[0])
  File "/content/speech2affective_gestures/loader_v2.py", line 442, in __init__
    self._make_speaker_model(self.lmdb_dir, precomputed_model)
AttributeError: 'TedDBParamsMinimal' object has no attribute '_make_speaker_model'

and there are two missing files from config/multimodal_context_v2.yml, I did not found them at repo :

wordembed_path: /mnt/q/Gamma/Gestures/src/data/fasttext/crawl-300d-2M-subword.bin
val_net_path: /mnt/q/Gamma/Gestures/src/Speech2Gestures/speech2affective_gestures/outputs/train_h36m_gesture_autoencoder/gesture_autoencoder_checkpoint_best.bin

Could you help me solving this error and downloading missing files?

UttaranB127 commented 2 years ago

You can find the FastText features available for download here: https://fasttext.cc/docs/en/english-vectors.html

The gesture autoencoder checkpoints are available here: https://drive.google.com/drive/folders/1XVqISCEdEFvLJARKUqCiPAPo4QxnzFNI?usp=sharing

I have also added the _make_speaker_model method to TedDBParamsMinimal. Please pull the repo again and it should work.

Ibrahimatef commented 2 years ago

Thanks for your update, but another error is shown in _make_speaker_model method

building a speaker model...
Traceback (most recent call last):
  File "main_v2.py", line 120, in <module>
    train_data_ted, val_data_ted, test_data_ted = loader.load_ted_db_data(data_path, s2ag_config_args, args.train_s2ag)
  File "/content/speech2affective_gestures/loader_v2.py", line 609, in load_ted_db_data
    train_dataset = TedDBParamsMinimal(config_args.train_data_path[0])
  File "/content/speech2affective_gestures/loader_v2.py", line 442, in __init__
    self._make_speaker_model(self.lmdb_dir, precomputed_model)
  File "/content/speech2affective_gestures/loader_v2.py", line 458, in _make_speaker_model
    vid = video['vid']
TypeError: list indices must be integers or slices, not str

I think to fix this error, i changed vid = video['vid'] to vid = video[6]['vid']

UttaranB127 commented 2 years ago

Ok, thanks for the update! I'll fix the code.

Ibrahimatef commented 2 years ago

After fixing previous error, self.num_total_samples is equal to Zero which is defined at processor_v2.py although all paths are correct

Reading data '/content/speech2affective_gestures/data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14'...
Found the cache /content/speech2affective_gestures/data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14_s2ag_v2_cache_mfcc_14
  building a speaker model...
    indexed 173 videos
  building a language model...
    loaded from /content/speech2affective_gestures/data/ted_db/vocab_models_s2ag/vocab_cache.pkl
Traceback (most recent call last):
  File "main_v2.py", line 127, in <module>
    pr = processor.Processor(base_path, args, s2ag_config_args, data_loader, pose_dim, coords, audio_sr)
  File "/content/speech2affective_gestures/processor_v2.py", line 207, in __init__
    self.num_test_samples, 100. * self.num_test_samples / self.num_total_samples))
ZeroDivisionError: float division by zero

Do you know why this could happen?

UttaranB127 commented 2 years ago

Is the file data inside the folder lmdb_test_s2ag_v2_cache_mfcc_14 empty for you? Its size should be around 3.7 GB. If it is empty, that could be an issue with the pyarrow version mismatch raised in issue #7. Unfortunately, I am not aware of a fix for the pyarrow version mismatch other than manually using different versions for saving and loading the data, so I have uploaded the entire preprocessed data in a single folder that you can download and use.

Ibrahimatef commented 2 years ago

I check that it is not empty, it is 3.7 GB as you said and I am using your preprocessed data trying with different versions of pyarrow and still get ZeroDivisionError, I think this error because of this line in utils/data_preprocessor.py : clips = video['clips'] As I did not find clips as a key in video dictionary , Actually video is a list and dictionary is found at index 6 of this list video[6] Dictionary items are : {'vid': 'SWvJxasiSZ8', 'start_frame_no': 260, 'end_frame_no': 302, 'start_time': 14.133333333333333, 'end_time': 16.933333333333334, 'is_correct_motion': True, 'filtering_message': 'PASS'} Any idea to fix this?

UttaranB127 commented 2 years ago

If you're using the preprocessed data, then the code shouldn't enter utils/data_preprocessor.py. Can you check the value of the if condition on line 490 in loader.py? This line: if not os.path.exists(preloaded_dir):. The path should exist, so it should return False.

Ibrahimatef commented 2 years ago

Do you mean loader_v2.py? If so, I realized that it was equal to True, because I defined data paths at configuration file like that : data/ted_db/lmdb_train_s2ag_v2_cache_mfcc_14 data/ted_db/lmdb_val_s2ag_v2_cache_mfcc_14 data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14 as renamed in your preprocessed data folder and at line 489, code adds '_s2ag_v2_cache_mfcc_{}'.format(self.num_mfcc) to folder names, so it became lmdb_test_s2ag_v2_cache_mfcc_14_s2ag_v2_cache_mfcc_14 If Folders are renamed to lmdb_train , lmdb_val and lmdb_test , _make_speaker_model causes an error at line 453:

File "/content/speech2affective_gestures/loader_v2.py", line 453, in _make_speaker_model
    lmdb_env = lmdb.open(lmdb_dir, readonly=True, lock=False)
lmdb.Error: /content/speech2affective_gestures/data/ted_db/lmdb_train: No such file or directory
UttaranB127 commented 2 years ago

Did you download the speaker models from the Google drive folder I shared? The code shouldn't try to create the speaker models either. That would require you to have the original lmdb_XXX folders (XXX=train/val/test) from ted_db, which is the exact error you're getting.

Ibrahimatef commented 2 years ago

Yes, I downloaded whole folder and fixed this error by removing'_s2ag_v2_cache_mfcc_{}'.format(self.num_mfcc) from that line

Ibrahimatef commented 2 years ago

After creating npz/test/test Folder manually and comment line 55 , 56 at processor_cv2.py #7 as mentioned here, this error is shown :

Traceback (most recent call last):
  File "main_v2.py", line 147, in <module>
    s2ag_epoch=290, make_video=True, save_pkl=True)
  File "/content/speech2affective_gestures/processor_v2.py", line 1445, in generate_gestures_by_dataset
    s2ag_model_found = self.load_model_at_epoch(epoch=s2ag_epoch)
  File "/content/speech2affective_gestures/processor_v2.py", line 357, in load_model_at_epoch
    self.s2ag_generator.load_state_dict(loaded_vars['gen_model_dict'])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PoseGenerator:
    size mismatch for speaker_embedding.0.weight: copying a param with shape torch.Size([1370, 16]) from checkpoint, the shape in current model is torch.Size([1357, 16]).
UttaranB127 commented 2 years ago

loader_v2.py, line 445: self.speaker_model = pickle.load(f). After this line, the value of self.speaker_model.n_words is 1370 for me and I don't get that error on model loading. My code uses the pkl file lmdb_train_s2ag_speaker_model.pkl for loading the train speaker model, which is what is used to set the field n_words to 1370. Maybe the similar file that is created for you only has 1357 words because of some missing ted_db data in your download? Not sure how the error would appear otherwise. Anyway, I have uploaded my speaker model files as well. Those should give you the correct n_words value of 1370.