NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.29k stars 3.18k forks source link

Book Corpus download link #262

Closed GeondoPark closed 4 years ago

GeondoPark commented 4 years ago

Will you no longer provide a list of Book Corpus download links?

swethmandava commented 4 years ago

BookCorpus download is still supported

https://github.com/NVIDIA/DeepLearningExamples/blob/f2fe0904cf646cf7e1341069f838d57242358c55/TensorFlow/LanguageModeling/BERT/data/create_datasets_from_start.sh#L19

GeondoPark commented 4 years ago

Thank you for answering. And I saw that the code used crawler from here. https://github.com/soskek/bookcorpus

When I run the crawler, if I download too many files, I get a 404Error or 403Forbidden, so I can't crawl anymore. Is there any way or where can I get permission? Can you tell me if you know how? Thank you,

swethmandava commented 4 years ago

247 #37 #90

It is a BooksCorpus server/Maintainance issue