dhlee347 / pytorchic-bert

Pytorch Implementation of Google BERT
Apache License 2.0
591 stars 179 forks source link

any sample dataset for pre-training? #7

Closed SeekPoint closed 5 years ago

dhlee347 commented 5 years ago

You have to make your own dataset (for example, using web crawling) (Toronto Book Corpus is not served online any longer)