Training requirements - Githubissues

dccuchile / beto

BETO - Spanish version of the BERT model

Creative Commons Attribution 4.0 International

492 stars 63 forks source link

Hello!

I have read the paper and I would like to replicate the training of this model from scratch. The thing is that you specify the configuration of the pre-training (hyper-parameters), but you don't specify the cost of hardware and time to train the models you have generated. Can you provide that information? I'm interested in knowing the number of TPU v3-8 pods (the interruptible ones, as mentioned in the article) used for training and the training time (hours, days, weeks) if possible. I would like to estimate how much it would cost to train a model like yours. Also, to have such a model in a production environment, which minimum hardware would it require to work?

Thank you in advance! Regards :)

dccuchile / beto

Training requirements #15