alexa / dialoglue

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview
Apache License 2.0
279 stars 25 forks source link

Error when downloading data #3

Closed zqwerty closed 3 years ago

zqwerty commented 3 years ago

Error when running bash download_data.sh in data_utils dir

mkdir: dialoglue: File exists
Do you wish to download dataset hwu?
1) Yes
2) No
#? 1
Downloading dataset hwu into ../dialoglue/hwu
Getting train data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [01:10<00:00,  1.10s/it]
Getting test data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [01:08<00:00,  1.07s/it]
Creating categories.json file
Dataset has been downloaded
Creating train_10.csv, etc...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:00<00:00, 4190.11it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:00<00:00, 5320.93it/s]
Done!
Do you wish to download dataset clinc?
1) Yes
2) No
#? 1
Downloading dataset clinc into ../dialoglue/clinc
Dataset has been downloaded
Creating train_10.csv, etc...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:00<00:00, 4146.97it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:00<00:00, 4657.23it/s]
Done!
Do you wish to download dataset banking?
1) Yes
2) No
#? 1
Downloading dataset banking into ../dialoglue/banking
Getting file: train.csv
Getting file: test.csv
Getting file: categories.json
Dataset has been downloaded
Creating train_10.csv, etc...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [00:00<00:00, 4635.72it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [00:00<00:00, 3855.20it/s]
Done!
/Users/zhuqi/Documents/share/research/Platform/dialoglue/data_utils
Processing dialoglue/hwu/
Processing dialoglue/banking/
Processing dialoglue/clinc/
Done downloading intent datasets
Cloning into 'task-specific-datasets'...
remote: Enumerating objects: 103, done.
remote: Counting objects: 100% (103/103), done.
remote: Compressing objects: 100% (58/58), done.
remote: Total 103 (delta 58), reused 77 (delta 45), pack-reused 0
Receiving objects: 100% (103/103), 1001.92 KiB | 339.00 KiB/s, done.
Resolving deltas: 100% (58/58), done.
Traceback (most recent call last):
  File "process_slot.py", line 14, in <module>
    train_data = json.load(open(dataset + "train_0.json"))
FileNotFoundError: [Errno 2] No such file or directory: 'dialoglue/restaurant8k/train_0.json'
Traceback (most recent call last):
  File "process_slot.py", line 30, in <module>
    sub_train_data = json.load(open(dataset + sub + "/train_0.json"))
FileNotFoundError: [Errno 2] No such file or directory: 'dialoglue/dstc8_sgd/Buses_1/train_0.json'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:01:15 --:--:--     0curl: (7) Failed to connect to fb.me port 80: Operation timed out
unzip:  cannot find or open sem.zip, sem.zip.zip or sem.zip.ZIP.
Traceback (most recent call last):
  File "process_top.py", line 97, in <module>
    data = read_data(data_file)
  File "process_top.py", line 16, in read_data
    f = open(data_file)
FileNotFoundError: [Errno 2] No such file or directory: 'top-dataset-semantic-parsing/train.tsv'
cp: top-dataset-semantic-parsing/train.txt: No such file or directory
cp: top-dataset-semantic-parsing/train_10.txt: No such file or directory
cp: top-dataset-semantic-parsing/eval.txt: No such file or directory
cp: top-dataset-semantic-parsing/test.txt: No such file or directory
cp: top-dataset-semantic-parsing/vocab.*: No such file or directory
rm: sem.zip: No such file or directory
Cloning into 'trippy-public'...
remote: Enumerating objects: 77, done.
remote: Counting objects: 100% (77/77), done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 77 (delta 21), reused 60 (delta 12), pack-reused 0
Unpacking objects: 100% (77/77), done.
/Users/zhuqi/Documents/share/research/Platform/dialoglue/data_utils
mv: rename dialoglue/multiwoz/MULTIWOZ2.1 to dialoglue/multiwoz/MULTIWOZ2.1/MULTIWOZ2.1: Invalid argument
Traceback (most recent call last):
  File "merge_data.py", line 61, in <module>
    train += load_top("dialoglue/top/")
  File "merge_data.py", line 21, in load_top
    data = open(fn+"train.txt").readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'dialoglue/top/train.txt'
Shikib commented 3 years ago

I can't reproduce this issue, so my guess is that it's system specific. Could you please run these commands. If they error out, please post the error. If they work, could you show the contents of dialoglue/

git clone https://github.com/PolyAI-LDN/task-specific-datasets
mv task-specific-datasets/span_extraction/restaurant8k/ dialoglue/restaurant8k
mv task-specific-datasets/span_extraction/dstc8/ dialoglue/dstc8_sgd
zqwerty commented 3 years ago

Thanks! I look into download_data.sh and the problem comes from curl -L "http://fb.me/semanticparsingdialog" --output sem.zip. So I download the top dataset manually to get around.