google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Apache License 2.0
3.23k stars 571 forks source link

Default Tutorial Not Working - Can't download MRPC data #257

Open Jadiker opened 2 years ago

Jadiker commented 2 years ago

When running the "prepare for training" tutorial code in Google Colab, I get the following error:

**** Model output directory: gs://albert_glue_tutorial/albert-tfhub/models/MRPC *****
Cloning into 'download_glue_repo'...
remote: Enumerating objects: 24, done.
remote: Total 24 (delta 0), reused 0 (delta 0), pack-reused 24
Unpacking objects: 100% (24/24), done.
Processing MRPC...
Traceback (most recent call last):
  File "download_glue_repo/download_glue_data.py", line 150, in <module>
    sys.exit(main(sys.argv[1:]))
  File "download_glue_repo/download_glue_data.py", line 142, in main
    format_mrpc(args.data_dir, args.path_to_mrpc)
  File "download_glue_repo/download_glue_data.py", line 65, in format_mrpc
    URLLIB.urlretrieve(MRPC_TRAIN, mrpc_train_file)
NameError: name 'URLLIB' is not defined
***** Task data directory: glue_data *****

I've followed the instructions as written on the Colab, setting up storage and filling in the parameter, as well as setting the runtime to TPU, then clicked "run all". How can I download the glue data needed for MRPC?

Jadiker commented 2 years ago

Looks like this issue has been noted here

Jadiker commented 2 years ago

The issue was fixed by doing the following.

  1. Click "Show Code" on the code cell where parameters (Bucket, Task, and Albert_Model) are filled in
  2. If you've already run the script once, you'll need to delete the download_glue_repo folder. This can be done by adding the line !rm -rf download_glue_repo right after the # Download glue data. comment
  3. Instead of cloning the broken repo, clone the fixed repo instead, which can be found here and was mentioned here. This can be done by changing the !git clone line to !git clone https://gist.github.com/fef1601580f269eca73bf26a198595f3.git download_glue_repo
  4. Rerun everything. This time, the dataset should be downloaded correctly.