dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

Korean BERT pre-trained #939

Closed haven-jeon closed 4 years ago

haven-jeon commented 5 years ago

Description

References

eric-haibin-lin commented 5 years ago

Nice! Are you porting it back to gluonnlp's model zoo?

haven-jeon commented 5 years ago

Nice! Are you porting it back to gluonnlp's model zoo?

Sure.

szha commented 4 years ago

we plan to make a release around Oct 20th in preparation for our EMNLP tutorial (Nov 3rd). @haven-jeon would you be able to include this feature?

muhyun commented 4 years ago

Nice! Are you porting it back to gluonnlp's model zoo?

Sure.

Heewon-nim, I am a data scientist in AWS based in Korea and I'd like to have a talk on Korean BERT pre-trained model implementation into GluonNLP. If you are interested in working together, please reply. :)

haven-jeon commented 4 years ago

we plan to make a release around Oct 20th in preparation for our EMNLP tutorial (Nov 3rd). @haven-jeon would you be able to include this feature?

Sorry for missing this message. I can make patch until end of Nov.

haven-jeon commented 4 years ago

Nice! Are you porting it back to gluonnlp's model zoo?

Sure.

Heewon-nim, I am a data scientist in AWS based in Korea and I'd like to have a talk on Korean BERT pre-trained model implementation into GluonNLP. If you are interested in working together, please reply. :)

Thanks for your suggestions. It will be happy work together.

By the way, how about this together with Korean ALBERT(https://github.com/MrBananaHuman/KalBert)?

jamiekang commented 4 years ago

Recent official ALBERT code: https://github.com/google-research/google-research/tree/master/albert

haven-jeon commented 4 years ago

Hi! @eric-haibin-lin. I currently working on porting KoBERT to GluonNLP. Could you upload to model zoo these binary?

model https://kobert.blob.core.windows.net/models/kobert/mxnet/bert_12_768_12_kobert_news_wiki_ko_cased-ccf0593e.params sha1sum : ccf0593e03b91b73be90c191d885446df935eb64

vocab(tokenizer) https://kobert.blob.core.windows.net/models/kobert/tokenizer/kobert_news_wiki_ko_cased-f86b1a83.zip sha1sum : f86b1a8355819ba5ab55e7ea4a4ec30fdb5b084f

leezu commented 4 years ago

@szha can help.

szha commented 4 years ago

done. sorry for the delay.

haven-jeon commented 4 years ago

Thanks @szha, I will PR.