Please refer to the trace of various datasets not being downloaded , i am commenting the ones that fails and try for the next one but it throws the below error . Links to download are working fine but script is not able to find the source , very strange
Finding source for components/stackexchange/stackexchange_dataset.tar
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36.8G/36.8G [5:36:23<00:00, 1.82Mbyte/s]
85%|█████████████████████████████████████████████████████████████████████████████████████████████████▏ | 31.4G/36.8G [5:45:25<59:40, 1.51Mbyte
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 36.8G/36.8G [15:41<00:00, 39.1Mbyte
Traceback (most recent call last):
File "/mnt/the_pile/utils.py", line 50, in download
tar_xf(fname)
File "/mnt/the_pile/utils.py", line 72, in tar_xf
tf = tarfile.open(x)
File "/usr/lib/python3.8/tarfile.py", line 1603, in open
return func(name, "r", fileobj, **kwargs)
File "/usr/lib/python3.8/tarfile.py", line 1667, in gzopen
fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
File "/usr/lib/python3.8/gzip.py", line 173, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'components/stackexchange/stackexchange_dataset.tar'
Download method [direct] https://the-eye.eu/public/AI/pile_preliminary_components/stackexchange_dataset.tar failed, trying next option
0.00byte [00:07, ?byte/s]
0.00byte [00:07, ?byte/s]
0.00byte [00:09, ?byte/s]
Traceback (most recent call last):
File "/mnt/the_pile/utils.py", line 50, in download
tar_xf(fname)
File "/mnt/the_pile/utils.py", line 72, in tar_xf
tf = tarfile.open(x)
File "/usr/lib/python3.8/tarfile.py", line 1603, in open
return func(name, "r", fileobj, **kwargs)
File "/usr/lib/python3.8/tarfile.py", line 1667, in gzopen
fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
File "/usr/lib/python3.8/gzip.py", line 173, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'components/stackexchange/stackexchange_dataset.tar'
Download method [direct] http://eaidata.bmk.sh/data/stackexchange_dataset.tar failed, trying next option
Traceback (most recent call last):
File "the_pile/pile.py", line 360, in <module>
dset._download()
File "/mnt/the_pile/datasets.py", line 456, in _download
download('components/stackexchange/stackexchange_dataset.tar', 'f64f31d20db8d8692c1a019314a14974b4911a34ffef126feaf42da88860c666', [
File "/mnt/the_pile/utils.py", line 67, in download
raise Exception('Failed to download {} from any source'.format(fname))
Exception: Failed to download components/stackexchange/stackexchange_dataset.tar from any source
Finding source for components/bookcorpus/books1.tar.gz
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.40G/2.40G [06:38<00:00, 6.03Mbyte/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.40G/2.40G [06:03<00:00, 6.62Mbyte/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.40G/2.40G [05:41<00:00, 7.05Mbyte/s]
Traceback (most recent call last):
File "/mnt/the_pile/utils.py", line 50, in download
tar_xf(fname)
File "/mnt/the_pile/utils.py", line 72, in tar_xf
tf = tarfile.open(x)
File "/usr/lib/python3.8/tarfile.py", line 1603, in open
return func(name, "r", fileobj, **kwargs)
File "/usr/lib/python3.8/tarfile.py", line 1667, in gzopen
fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
File "/usr/lib/python3.8/gzip.py", line 173, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'components/bookcorpus/books1.tar.gz'
Download method [direct] https://the-eye.eu/public/AI/pile_preliminary_components/books1.tar.gz failed, trying next option
2.40Gbyte [03:22, 11.9Mbyte/s]
2.40Gbyte [03:20, 12.0Mbyte/s]
2.40Gbyte [03:20, 12.0Mbyte/s]
Traceback (most recent call last):
File "/mnt/the_pile/utils.py", line 50, in download
tar_xf(fname)
File "/mnt/the_pile/utils.py", line 72, in tar_xf
tf = tarfile.open(x)
File "/usr/lib/python3.8/tarfile.py", line 1603, in open
return func(name, "r", fileobj, **kwargs)
File "/usr/lib/python3.8/tarfile.py", line 1667, in gzopen
fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
File "/usr/lib/python3.8/gzip.py", line 173, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'components/bookcorpus/books1.tar.gz'
Download method [direct] http://battle.shawwn.com/sdb/books1/books1.tar.gz failed, trying next option
Traceback (most recent call last):
File "the_pile/pile.py", line 360, in <module>
dset._download()
File "/mnt/the_pile/datasets.py", line 106, in _download
download('components/bookcorpus/books1.tar.gz', 'e3c993cc825df2bdf0f78ef592f5c09236f0b9cd6bb1877142281acc50f446f9', [
File "/mnt/the_pile/utils.py", line 67, in download
raise Exception('Failed to download {} from any source'.format(fname))
Exception: Failed to download components/bookcorpus/books1.tar.gz from any source
Hi Team
Please refer to the trace of various datasets not being downloaded , i am commenting the ones that fails and try for the next one but it throws the below error . Links to download are working fine but script is not able to find the source , very strange