NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.56k stars 3.23k forks source link

Is the toronto corpus still availiable? #211

Closed forjiuzhou closed 4 years ago

forjiuzhou commented 5 years ago

There is code python3 /workspace/bookcorpus/download_files.py in BooksDownloader. But I can't find this script.

nvcforster commented 5 years ago

Hi, it is available still. Does it work if you run python3 /workspace/bert/data/bertPrep.py --action download --dataset bookscorpus?

The Dockerfile should be cloning the bookcorpus scripts into /workspace/bookcorpus when the image is built. I tested a build of a fresh clone of our DL examples just now to verify it is building correctly. If files are missing, please verify there are not errors during the Docker build.