google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Apache License 2.0
3.23k stars 571 forks source link

train albert on custom dataset #239

Closed mlcom closed 3 years ago

mlcom commented 3 years ago

I have Managed to train language model on custom data for Persian Language https://mlcom.github.io/Create-Language-Model/ You can follow step by step. 1) Create Dataset 2) Create Tokenizer for Custom Data 3) Train Albert Large Model on Custom data.