google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.23k stars 9.62k forks source link

Data source you used for training the wordpiece model in your original paper #1379

Open lsy641 opened 1 year ago

lsy641 commented 1 year ago

Thanks for reading my post!

I am doing research about vocabulary. May I know what data you were using to train when building the workpiece model with 30k vocabulary mentioned in BERT paper?

HMSALNGM commented 1 year ago

https://www.google.com/url?q=http://toptop.com&source=gmail&ust=1683526046859000&usg=AOvVaw3QHSpgB2FbPS1Prem6qJAp

HMSALNGM commented 1 year ago

DNS Error: DNS type mx lookup of toptop.com responded with code SERVFAIL