b04901014 / FT-w2v2-ser

Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
MIT License
136 stars 32 forks source link

Questions about batch size and clustering model #12

Open kawlil opened 11 months ago

kawlil commented 11 months ago
  1. What's the rationale behind making the default batch size 64 for the pre-training, continued pre-training, and fine-tuning loops? Others have mentioned that they had to reduce the batch size to make it run on their systems, considering the original code uses a single GPU. Is this the batch size that produced the best results in your experiments?
  2. I noticed that cluster.py accepts either wav2vec or wav2vec2 as the model_type. Why did you move forward with making wav2vec2 as the default model? Could you have used HuBERT or other variations of a transformer-based model?