dccuchile / beto

BETO - Spanish version of the BERT model
Creative Commons Attribution 4.0 International
492 stars 63 forks source link

Training requirements #15

Closed fqez closed 3 years ago

fqez commented 4 years ago

Hello!

I have read the paper and I would like to replicate the training of this model from scratch. The thing is that you specify the configuration of the pre-training (hyper-parameters), but you don't specify the cost of hardware and time to train the models you have generated. Can you provide that information? I'm interested in knowing the number of TPU v3-8 pods (the interruptible ones, as mentioned in the article) used for training and the training time (hours, days, weeks) if possible. I would like to estimate how much it would cost to train a model like yours. Also, to have such a model in a production environment, which minimum hardware would it require to work?

Thank you in advance! Regards :)

josecannete commented 3 years ago

Hello @fqez,

Regarding the hardware, we used just 1 (one) single TPU v3-8 (not a pod). Is hard to say how much training time since the hardware is preemtible, but as stated in the paper we trained for 1M steps and later another 1M steps (thanks to TFRC support), so 2M steps.

About the second question, I don't have experience on that matter but I'm pretty sure you can find something on https://github.com/huggingface/transformers/issues.

Thanks for your interest and sorry for the late response. Regards!