Closed Gpwner closed 2 years ago
Hello, thanks for your testing and I would like to help.
(1) In the script, I use per_device_train_batch_size=2, and accumulated_gradient_steps=4, making effective batch size 8. The reason is that I don't have GPUs that has large memory (like V100). In your case, if you set per_device_train_batch_size=8, then there is no need for accumulation, otherwise your effective batch size is 32. This leads to much less updates and results in worse performance. So you can change the ACC to 1, and test.
(2) For alpha > 0 (in your case, 0.1), use learning rate = 5e-5 instead of 1e-5. As stated in the paper, 1e-5 is only used when alpha=0.
(2) By looking at your log, it seems that wer is pretty low. I compared with my logs on 01F, I typically get around 0.2. I suspect the ASR part is well trained, but the classification part is under trained. Again, this could due to the batch size or learning rate issue.
(3) A very tricky thing you will observe, is that the evaluation loss will actually increase after around 20 epochs. It looks like overfitting, but the evaluation WER and accuracy will still getting better and better. One reason could be the training CTC loss is not the same as the evaluation metric (WER), causing this strange behavior. So when you see eval loss goes up, don't stop and keep it running. The final acc on 01F I can get is around 0.74.
Hope these helps. Thanks!
Hello, thanks for your testing and I would like to help.
(1) In the script, I use per_device_train_batch_size=2, and accumulated_gradient_steps=4, making effective batch size 8. The reason is that I don't have GPUs that has large memory (like V100). In your case, if you set per_device_train_batch_size=8, then there is no need for accumulation, otherwise your effective batch size is 32. This leads to much less updates and results in worse performance. So you can change the ACC to 1, and test.
(2) For alpha > 0 (in your case, 0.1), use learning rate = 5e-5 instead of 1e-5. As stated in the paper, 1e-5 is only used when alpha=0.
(2) By looking at your log, it seems that wer is pretty low. I compared with my logs on 01F, I typically get around 0.2. I suspect the ASR part is well trained, but the classification part is under trained. Again, this could due to the batch size or learning rate issue.
(3) A very tricky thing you will observe, is that the evaluation loss will actually increase after around 20 epochs. It looks like overfitting, but the evaluation WER and accuracy will still getting better and better. One reason could be the training CTC loss is not the same as the evaluation metric (WER), causing this strange behavior. So when you see eval loss goes up, don't stop and keep it running. The final acc on 01F I can get is around 0.74.
Hope these helps. Thanks!
Yes It works.
But I have a question,How can I load the CTCTrainer model after training?There is no from_pretrained()
in it...
I am really fresh to Huggingface Trainner,so can you help?
Hello, thanks for your testing and I would like to help. (1) In the script, I use per_device_train_batch_size=2, and accumulated_gradient_steps=4, making effective batch size 8. The reason is that I don't have GPUs that has large memory (like V100). In your case, if you set per_device_train_batch_size=8, then there is no need for accumulation, otherwise your effective batch size is 32. This leads to much less updates and results in worse performance. So you can change the ACC to 1, and test. (2) For alpha > 0 (in your case, 0.1), use learning rate = 5e-5 instead of 1e-5. As stated in the paper, 1e-5 is only used when alpha=0. (2) By looking at your log, it seems that wer is pretty low. I compared with my logs on 01F, I typically get around 0.2. I suspect the ASR part is well trained, but the classification part is under trained. Again, this could due to the batch size or learning rate issue. (3) A very tricky thing you will observe, is that the evaluation loss will actually increase after around 20 epochs. It looks like overfitting, but the evaluation WER and accuracy will still getting better and better. One reason could be the training CTC loss is not the same as the evaluation metric (WER), causing this strange behavior. So when you see eval loss goes up, don't stop and keep it running. The final acc on 01F I can get is around 0.74. Hope these helps. Thanks!
Yes It works. But I have a question,How can I load the CTCTrainer model after training?There is no
from_pretrained()
in it... I am really fresh to Huggingface Trainner,so can you help?
I am not pretty sure that it seems to be like this:
model = Wav2Vec2ForCTCnCLS.from_pretrained(
'output/tmp/checkpoint-93500',
cache_dir=model_args.cache_dir,
# gradient_checkpointing=training_args.gradient_checkpointing,
vocab_size=len(processor.tokenizer),
cls_len=len(cls_label_map),
alpha=model_args.alpha,
)
...
trainer = CTCTrainer(
model=model,
data_collator=data_collator,
args=training_args,
compute_metrics=compute_metrics,
tokenizer=processor.feature_extractor
)
But I am so confuse that if I uncomment # gradient_checkpointing=training_args.gradient_checkpointing
,then I will get an error:
model = Wav2Vec2ForCTCnCLS.from_pretrained(
File "/home/*/miniconda3/envs/INTERSpeech21/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1402, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
TypeError: __init__() got an unexpected keyword argument 'gradient_checkpointing'
Process finished with exit code 1
Yes the from_pretrained() will load the pretrained model, as the Wav2Vec2ForCTCnCLS is inheritated from the huggingface pretrainedModel class, so the from_pretrained() function works the same way.
As for the gradient_checkpointing part, I am not sure either. I never used this feature before. If just commenting that, can you smoothly load the model and run the code?
Yes the from_pretrained() will load the pretrained model, as the Wav2Vec2ForCTCnCLS is inheritated from the huggingface pretrainedModel class, so the from_pretrained() function works the same way.
As for the gradient_checkpointing part, I am not sure either. I never used this feature before. If just commenting that, can you smoothly load the model and run the code?
Yes.Thanks for your help,I will close this issue.
As is seen in README,
Train a model on 9 sessions cost so much time when I just have 2 Nvidia V100.So I change the code in
run_emotion.py
from :to
And keep only these files in iemocap:
Then I call the predict API in the end like this:
Here are my run.sh:
Here are the log of my loss:
The final acc is 0.591705069124424 which is not as good as the result of the paper.