brightmart / albert_zh

A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
https://arxiv.org/pdf/1909.11942.pdf
3.91k stars 758 forks source link

use Multilingual pretrain model Bert #40

Open kewin1807 opened 4 years ago

kewin1807 commented 4 years ago

please tell me, can i use Multilingual pretrain model from Bert to train custom data with albert code ???

brightmart commented 4 years ago

you can have a try. and be aware that there are some differences between bert and albert in modelling.py why do you want to train multillingual model?

kewin1807 commented 4 years ago

i want to use model with vietnamese language. The important of change is share parameters, i know. how i can train with my language. Thanks for support =)

brightmart commented 4 years ago

1、you can change vocab.txt in ./albert_config, then set non_chinese to True when create pretrain data using create_pretraining_data.py 2、then do pre train using run_pretraining.py

kewin1807 commented 4 years ago

okay. Thanks for support. Best repo =)

kewin1807 commented 4 years ago

I have tried to pretrain with my dataset, but i see the loss is very small but accuracy is not improve. How i can improve result

geekboood commented 4 years ago

@brightmart Can we have a multilingual model for just Chinese and English? Cause in practical scenerios we may meet many english words in APP names, music names, Apple's all products's name and so on, and Google's multilingual model has too many languages. Our daliy life cannot leave English, you can see that Apple try to use purely Chinese in its products, such as replace Finder with 访达 which I think is totally a mess. Maybe a language model for just Chinese and English can have huge impact on both research and industry and many multilingual tasks can benefit from it.