google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
37.9k stars 9.57k forks source link

Creating_pretrainingdata is getting killed with bookcorpus dataset #1270

Open sairamgajavelli opened 2 years ago

sairamgajavelli commented 2 years ago

The dataset I am using is Book Corpus having 18000 books The system i am training on is having 64GB of RAM When I am trying to generate the pretraining data using create_pretraining_data.py it is getting killed in between and also only a single core is getting used. Please give me a solution to this.

sairamgajavelli commented 2 years ago

Please update this