State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
12.94k
stars
3.12k
forks
source link
[Bert/Pytorch] Difference between data_download.sh and create_dataset_from scratch.sh #1326
Describe the bug
This is not a bug but a question. I'm wondering what's the difference between data_download.sh and create_dataset_from scratch.sh? In README.md the suggested way to download and preprocess data is using create_dataset_from scratch.sh, and doesn't mention the usage of data_donwload.sh.
In my understanding, in spite of downloading Wikipedia, data_donwload.sh will also download BookCorpus for pre-training usage. So what's the reason for not using data_download.sh to prepare data for pre-training.
Related to Bert/Pytorch
Describe the bug This is not a bug but a question. I'm wondering what's the difference between
data_download.sh
andcreate_dataset_from scratch.sh
? InREADME.md
the suggested way to download and preprocess data is usingcreate_dataset_from scratch.sh
, and doesn't mention the usage ofdata_donwload.sh
.In my understanding, in spite of downloading Wikipedia,
data_donwload.sh
will also download BookCorpus for pre-training usage. So what's the reason for not usingdata_download.sh
to prepare data for pre-training.