Closed Jiafei127 closed 6 months ago
Hi @Jiafei127 I have the same folder structure as yours so normally it should be fine. What about using an absolute path like "/path/to/datasets/ESD" instead of datasets/ESD
?
Your replies are so quick! Thank you very much! I will try to change the path way to see, then I have a small question, in the prepare_ESD.py script, does the path to read the ESD raw data include the "train" folder? Because I found that the raw data is in the following format: |-ESD |--0011.... |---Angry.... |----***wav ....
It does not contain the error "No such file or directory:.... /datasets/ESD/0011/Angry/train".
In other words, I found that my ESD dataset is not divided according to sub_sub_folders = ["train", "evaluation", "test"] in Voyage 36.
You are right sorry I didn't check this. The script does include the split folder see this line, so my ESD folder is in this structure: |-ESD |--0011.... |---Angry.... |----train/test/evaluation |-----***wav ....
Did you download ESD with the same link in readme? It should be the same if not updated by its authors.
I just re-downloaded ESD. As you said, dataset division is no longer available in https://github.com/HLTSingapore/Emotional-Speech-Data. 😧
Thank you @Jiafei127 for spotting this. I just updated a google drive link for downloading the old version of ESD and this is the easiest fix I see for now.
Excellent! Many thanks for your help!
Hi, I'm here to seek further guidance in my learning. I have already achieved results using the code you provided for "microsoft/wavlm-large"
. I would like to know if modifying the "wav2vec2_hub"
parameter in the "train.yaml"
file is sufficient to reproduce the results for "facebook/wav2vec2-large"
and "facebook/hubert-large-ll60k"
. However, it seems that directly modifying it leads to an error, and I'm unsure about the correct approach for modification.
Hi @Jiafei127 could you show me the error as well as your speechbrain version please? What about forcing it to 0.5.13?
Sorry for the late reply to the message. Here's the environment I configured:
speechbrain 0.5.13
python 3.8.19
pytorch 1.10.1
cudatoolkit 11.3.1
torchaudio 0.10.1
torchvision 0.11.2
it works perfectly as per the train.yaml
you provided, but if I wanted to reproduce Table 2 in your paper, I only changed the wav2vec2_hub: “microsoft/wavlm-large”
in the 15th line of the train.yaml
to wav2vec2_hub: “ facebook/wav2vec2-large"
gives the following error (since I'm not familiar with speechbrain
, are there other changes needed if I want to reproduce the performance of wav2vec2 or hubert?) :
speechbrain.core - Exception:
Traceback (most recent call last):
File "train.py", line 361, in <module>
emo_id_brain.fit(
File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/core.py", line 1143, in fit
self.on_fit_start()
File "i/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/core.py", line 797, in on_fit_start
self.checkpointer.recover_if_possible(
File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/utils/checkpoints.py", line 840, in recover_if_possible
self.load_checkpoint(chosen_ckpt, device)
File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/utils/checkpoints.py", line 853, in load_checkpoint
self._call_load_hooks(checkpoint, device)
File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/utils/checkpoints.py", line 988, in _call_load_hooks
default_hook(obj, loadpath, end_of_epoch, device)
File "/miniconda3/envs/sed110/lib/python3.8/site-packages/speechbrain/utils/checkpoints.py", line 93, in torch_recovery obj.load_state_dict(torch.load(path, map_location=device), strict=True)
File "/miniconda3/envs/sed110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for HuggingFaceWav2Vec2:
Unexpected key(s) in state_dict: "model.feature_extractor.conv_layers.1.layer_norm.weight", ``"model.feature_extractor.conv_layers.1.layer_norm.bias", "model.feature_extractor.conv_layers.2.layer_norm.weight",
...
"model.encoder.layers.23.attention.gru_rel_pos_linear.weight", "model.encoder.layers.23.attention.gru_rel_pos_linear.bias".
Hi @Jiafei127 it seems that the model is trying to load an existing checkpoint which is a wavlm but not wav2vec. Maybe try renaming a new experiment folder by changing this [line].(https://github.com/BenoitWang/Speech_Emotion_Diarization/blob/61e6a219aee3fa8df5cb7fcb41d5a81d16558a2d/hparams/train.yaml#L10)
This work is highly significant! But I had some problems with the implementation of the dataset, which is stored in the following format datasets |-EmoV-DB |--bea_Amused .... |----***.wav.....
|-ESD |--0001 |---Angry.... |----***wav ....
|-IEMOCAP
|-JL_corpus
|-RAVDESS
--ZED
It should be downloaded as described by the author, but I get an error when executing the train.py file: FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ESD/0011/Angry/train' And for the EmoV-DB dataset is not reading in any audio correctly. I'm very much looking forward to the author's help in resolving this issue, thank you very much!