dbmdz / berts

DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models
MIT License
155 stars 12 forks source link

How to train bert-base-italian-* models? #4

Closed nikhilno1 closed 4 years ago

nikhilno1 commented 4 years ago

Thanks for sharing. I want to train a different language model (Hindi). How did you train your bert-base-italian-* models? Are those steps covered anywhere?

stefan-it commented 4 years ago

Hi @nikhilno1 ,

for training the Italian models we did the following steps:

I do plan to write a cheatsheet for an upcoming BERT model, where I use the awesome new Hugging Face tokenizers library for creating the BERT vocab!

stefan-it commented 4 years ago

Hi @nikhilno1,

for the Turkish BERT model I created a cheatsheet for the training process:

https://github.com/stefan-it/turkish-bert/blob/master/CHEATSHEET.md

It also shows how to generate a BERT-compatible vocab.

I hope this helps + good luck with the Hindi model!

nikhilno1 commented 4 years ago

Thanks for sharing. Will go through it.