-
It seems I cannot get docker to correctly access the dns system and resolve IP addresses.
Thus I have had to run the data downloads manually.
however I cannot find the download_files.py script neede…
-
Could this official repository https://github.com/tensorflow/tensor2tensor support bert?
daiwk updated
5 years ago
-
Is it possible to sort the downloaded files author-wise here?
Thanks!
-
Here are the files I see after going through data download instructions in https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT
The wikipedia directory seems …
-
Hi all,
I try to generate the pretraining corpus for BERT with pregenerate_training_data.py. In the BERT paper, it reports about 6M+ instances(segment A+segmentB, less than 512 tokens). But I get 18M…
-
example:
python3.6 download_files.py --list url_list.jsonl --out out_txts --trash-bad-count
0 files had already been saved in out_txts.
File is not a zip file …
-
BookCorpus (http://yknzhu.wixsite.com/mbweb) no longer provides data sets, and I have not found it online. Do you still have a backup of the data set? Can you send me a copy of the data?
funqc updated
5 years ago
-
Hi, I have some questions about the detail of Chinese BERT-Base model.
1. Is the model trained base on entire Chinese wikipedia raw text ?
2. Are there additional pre-processing steps for raw corp…
-
Hi,Thanks for your code, it's really useful for most nlp researchers and thank you again.
And when I run this code, it's often interrupted by network error after download a little files, I thought …
-
Hi, thank you very much for the implementation!
I'm trying to compare your implementation with the official TF BERT head-to-head with the Gutenberg dataset (since the BookCorpus dataset is no longe…