BenoitWang / Speech_Emotion_Diarization

Apache License 2.0
59 stars 7 forks source link

The dataset processing code is incorrect. #5

Closed Jiafei127 closed 6 months ago

Jiafei127 commented 6 months ago

This work is highly significant! But I had some problems with the implementation of the dataset, which is stored in the following format datasets |-EmoV-DB |--bea_Amused .... |----***.wav.....

|-ESD |--0001 |---Angry.... |----***wav ....

|-IEMOCAP

|-JL_corpus

|-RAVDESS

--ZED

It should be downloaded as described by the author, but I get an error when executing the train.py file: FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ESD/0011/Angry/train' And for the EmoV-DB dataset is not reading in any audio correctly. I'm very much looking forward to the author's help in resolving this issue, thank you very much!

BenoitWang commented 6 months ago

Hi @Jiafei127 I have the same folder structure as yours so normally it should be fine. What about using an absolute path like "/path/to/datasets/ESD" instead of datasets/ESD?

Jiafei127 commented 6 months ago

Your replies are so quick! Thank you very much! I will try to change the path way to see, then I have a small question, in the prepare_ESD.py script, does the path to read the ESD raw data include the "train" folder? Because I found that the raw data is in the following format: |-ESD |--0011.... |---Angry.... |----***wav ....

It does not contain the error "No such file or directory:.... /datasets/ESD/0011/Angry/train".

Jiafei127 commented 6 months ago

In other words, I found that my ESD dataset is not divided according to sub_sub_folders = ["train", "evaluation", "test"] in Voyage 36.

BenoitWang commented 6 months ago

You are right sorry I didn't check this. The script does include the split folder see this line, so my ESD folder is in this structure: |-ESD |--0011.... |---Angry.... |----train/test/evaluation |-----***wav ....

Did you download ESD with the same link in readme? It should be the same if not updated by its authors.

Jiafei127 commented 6 months ago

I just re-downloaded ESD. As you said, dataset division is no longer available in https://github.com/HLTSingapore/Emotional-Speech-Data. 😧

BenoitWang commented 6 months ago

Thank you @Jiafei127 for spotting this. I just updated a google drive link for downloading the old version of ESD and this is the easiest fix I see for now.

Jiafei127 commented 6 months ago

Excellent! Many thanks for your help!

Jiafei127 commented 6 months ago

Hi, I'm here to seek further guidance in my learning. I have already achieved results using the code you provided for "microsoft/wavlm-large". I would like to know if modifying the "wav2vec2_hub" parameter in the "train.yaml" file is sufficient to reproduce the results for "facebook/wav2vec2-large" and "facebook/hubert-large-ll60k". However, it seems that directly modifying it leads to an error, and I'm unsure about the correct approach for modification.

BenoitWang commented 6 months ago

Hi @Jiafei127 could you show me the error as well as your speechbrain version please? What about forcing it to 0.5.13?

Jiafei127 commented 6 months ago

Sorry for the late reply to the message. Here's the environment I configured: speechbrain 0.5.13 python 3.8.19 pytorch 1.10.1 cudatoolkit 11.3.1 torchaudio 0.10.1 torchvision 0.11.2

it works perfectly as per the train.yaml you provided, but if I wanted to reproduce Table 2 in your paper, I only changed the wav2vec2_hub: “microsoft/wavlm-large” in the 15th line of the train.yaml to wav2vec2_hub: “ facebook/wav2vec2-large" gives the following error (since I'm not familiar with speechbrain, are there other changes needed if I want to reproduce the performance of wav2vec2 or hubert?) :

speechbrain.core - Exception: Traceback (most recent call last): File "train.py", line 361, in <module> emo_id_brain.fit( File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/core.py", line 1143, in fit self.on_fit_start() File "i/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/core.py", line 797, in on_fit_start self.checkpointer.recover_if_possible( File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/utils/checkpoints.py", line 840, in recover_if_possible self.load_checkpoint(chosen_ckpt, device) File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/utils/checkpoints.py", line 853, in load_checkpoint self._call_load_hooks(checkpoint, device) File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/utils/checkpoints.py", line 988, in _call_load_hooks default_hook(obj, loadpath, end_of_epoch, device) File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/utils/checkpoints.py", line 93, in torch_recovery obj.load_state_dict(torch.load(path, map_location=device), strict=True) File "/miniconda3/envs/sed110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for HuggingFaceWav2Vec2: Unexpected key(s) in state_dict: "model.feature_extractor.conv_layers.1.layer_norm.weight", ``"model.feature_extractor.conv_layers.1.layer_norm.bias", "model.feature_extractor.conv_layers.2.layer_norm.weight", ... "model.encoder.layers.23.attention.gru_rel_pos_linear.weight", "model.encoder.layers.23.attention.gru_rel_pos_linear.bias".

table-2

BenoitWang commented 6 months ago

Hi @Jiafei127 it seems that the model is trying to load an existing checkpoint which is a wavlm but not wav2vec. Maybe try renaming a new experiment folder by changing this [line].(https://github.com/BenoitWang/Speech_Emotion_Diarization/blob/61e6a219aee3fa8df5cb7fcb41d5a81d16558a2d/hparams/train.yaml#L10)