arxyzan / data2vec-pytorch

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI
MIT License
168 stars 26 forks source link

Which Hyperparameter tuning method to use? #5

Closed anirudh2019 closed 2 years ago

anirudh2019 commented 2 years ago

First of all, thank you for the great work @AryanShekarlaban and @kabouzeid!

quick question: I have less experience with training big models like Transformers. I see that there are many frameworks and Algorithms for Hyperparameter tuning in internet. Could you suggest a hyperparameter tuning framework and algorithm for data2vec?

Thank you!

arxyzan commented 2 years ago

Hello @anirudh2019, glad to see this repo has been helpful to you. Generally, hyperparameter tuning is a trial-error method and making it actually work requires a lot of effort. It gets even more difficult in training large models like Transformers and it's really not recommended to try to tune hyperparams for pretraining unless you have access to really powerful resources. The recommended way is to tune params for downstream tasks and distillation. In that case you can refer to the paper of the encoder you use (RoBERTa, BEiT, Wav2Vec2, etc) to see their best practices regarding hyperparameter tuning.

Best, Aryan

anirudh2019 commented 2 years ago

Thank you for your suggestion!