Closed FriedaSmith closed 3 years ago
Em... looks like it is not reading the transcripiton. Can you share a couple of rows in your training.tsv file, such that I can check if it is compatible with the code. Most likely the code didn't find a correct way to load the labels.
The structure of iemocap folder is as follows:
Part of iemocap_01F.train.csv is as follows:
Looks like it is csv file reading issue. I think you need to add quote (") to the last column. I can see some have quote but some do not. If no quote, once there is a comma, it will be identified as two columns. Another thing is that I think you can remove all the cc ([BREATH], [LAUGHTER], etc). I didn't use any of them. Not sure what would happen if leave them there. Let me know if this works.
I removed all the cc ([BREATH], [LAUGHTER], etc) and added quote (") to the last column, but it's still this error.
100% 97/97 [00:08<00:00, 12.06it/s]Traceback (most recent call last):
File "run_emotion.py", line 547, in <module>
main()
File "run_emotion.py", line 543, in main
trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1325, in train
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch)
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1426, in _maybe_log_save_evaluate
metrics = self.evaluate()
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 2031, in evaluate
metric_key_prefix=metric_key_prefix,
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 2260, in evaluation_loop
metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
File "run_emotion.py", line 519, in compute_metrics
wer = wer_metric.compute(predictions=ctc_pred_str, references=ctc_label_str)
File "/usr/local/lib/python3.7/dist-packages/datasets/metric.py", line 402, in compute
output = self._compute(predictions=predictions, references=references, **kwargs)
File "/root/.cache/huggingface/modules/datasets_modules/metrics/wer/d630b0e978819dda4b232fbce9934c6221a04bb2fcea1bfe8e7cb177339b3d86/wer.py", line 103, in _compute
measures = compute_measures(reference, prediction)
File "/usr/local/lib/python3.7/dist-packages/jiwer/measures.py", line 188, in compute_measures
truth, hypothesis, truth_transform, hypothesis_transform
File "/usr/local/lib/python3.7/dist-packages/jiwer/measures.py", line 244, in _preprocess
raise ValueError("the ground truth cannot be an empty")
ValueError: the ground truth cannot be an empty
1% 100/9600 [01:50<2:55:30, 1.11s/it]
I see. I think I can prepare the csv files and update later after testing for you to run.
I just uploaded the csv files. You need to replace the 'path_to_wavs' string in the files, with your actual path that stores all the wavs. For example, if you store wavs at /wav_path/, just run:
for f in iemocap/*.csv; do sed -i 's/\/path_to_wavs/\/wav_path/' $f; done
(use the absolute path here).
I have tested and should be able to run. Let me know if this can work.
The .csv files you uploaded can work normally. Thank you very much for your help.
How these csv files are created? what is datasets module in code ?
import datasets
please any one of you can help me with this?
How these csv files are created? what is datasets module in code ?
import datasets
please any one of you can help me with this?
@Coding511 , The csv files are generated by parsing the IEMOCAP datasets. Once you obtain it, it should be easy to write a script to generate the csv files like what I have, or you can just use mine.
The datasets package are Huggingface datasets, see here: https://huggingface.co/docs/datasets/index.
Hello. When I run
bash run.sh
, it had an error. The error details are as follows: