Questions about batch size and clustering model

What's the rationale behind making the default batch size 64 for the pre-training, continued pre-training, and fine-tuning loops? Others have mentioned that they had to reduce the batch size to make it run on their systems, considering the original code uses a single GPU. Is this the batch size that produced the best results in your experiments?
I noticed that cluster.py accepts either wav2vec or wav2vec2 as the model_type. Why did you move forward with making wav2vec2 as the default model? Could you have used HuBERT or other variations of a transformer-based model?

b04901014 / FT-w2v2-ser