Kaggle / kaggle-api

Official Kaggle API
Apache License 2.0
6.12k stars 1.08k forks source link

When downloading dataset, it suddenly stopped and didn't resume, just stucked. #113

Closed Kwongrf closed 1 year ago

Kwongrf commented 5 years ago

kaggle competitions download -c airbus-ship-detection -f train_v2.zip

2018-10-29 20:22:23,200 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError("bad handshake: SysCallError(104, 'ECONNRESET')",),)': /kaggle-competitions-data/kaggle/9988/128738/train_v2.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1541074682&Signature=rHjYOnhHRhCQD3l%2F%2FVQkLUXkt5BbJhIa37KfRhM3sa6jFgMVoK%2FtJsiMr4ZAXtMXZ87itkHUn1GUbYAb5j6srEn0A3f%2BrJMrnYQmeJJ4Vi5ra7kY49AzRQf2ZMnfJGbn80NY9rz5QLIKbYpQaO80hUo%2B0ZVqrfGNcwoGYtHl%2BJjmIcKvTXJUEh5SGR ...... 2018-10-29 20:26:39,692 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', OSError("(104, 'ECONNRESET')",))': /kaggle-competitions-data/kaggle/9988/128738/train_v2.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1541074682&Signature=rHjYOnhHRhCQD3l%2F%2FVQkLUXkt5BbJhIa37KfRhM3sa6jFgMVoK%2FtJsiMr4ZAXtMXZ87itkHUn1GUbYAb5j6srEn0A3f%2BrJMrnYQmeJJ4Vi5ra7kY49AzRQf2ZMnfJGbn80NY9rz5QLIKbYpQaO80hUo%2B0ZVqrfGNcwoGYtHl%2BJjmIcKvTXJUEh5SGR...... Downloading train_v2.zip to /data/krf/another 0%| | 3.00M/26.4G [00:19<5:16:36, 1.49MB/s]

I tried to restart this command many times, but it didn't start or it started but stopped very soon. I have no idea where the problem is.

ghost commented 5 years ago

I cannot reproduce this. What client version are you using? Is this still happening?

Kwongrf commented 5 years ago

Kaggle API 1.4.7.1

It's still happening, but there is no problem when I was downloading another competitions dataset image

It's very weird because I cannot download this dataset on my laptop neither. But I can download it on another server which I installed kaggle-api just now. They are using the same token.

ghost commented 5 years ago

Typically when I see this, it's due to some server side connectivity issues or it's because I mistakenly left debug settings in the API client (turns out that the client gets very confused when you point it towards a server that doesn't exist). The latter problem shouldn't exist with 1.4.7.1, and I don't see any related errors server side.

I'll try to investigate this, but being unable to reproduce it is going to make it challenging.

Kwongrf commented 5 years ago

Thank you. This problem becomes more serious...I cannot download another dataset on another server which was able to download dataset last night. image You can see that it have downloaded a small file and then stucked...

Three different machines have the same problem (Win10, Ubuntu14, Ubuntu16). They are using the same token. I hope you guys can solve this problem soon since I have to waste much time to download dataset.

ghost commented 5 years ago

Are all of those machines on the same network? Are you able to download hte problematic files from the Kaggle website?

Kwongrf commented 5 years ago

Two of them are on same network but another one isn't. Even though they all are on different network, they still can't download. Today I got the following traceback,

C:\Users\kwong>kaggle competitions download -c airbus-ship-detection -f train_v2.zip
2018-10-31 09:13:28,054 WARNING Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', OSError("(10060, 'WSAETIMEDOUT')",))': /kaggle-competitions-data/kaggle/9988/128738/train_v2.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1541207587&Signature=D21eqFvKlCOlxBoWhAu2l9LTnwe4vWs1TZiFNHcAEGzSpGTJfSzBURa4f7YNwHGm3I2nbKpqvq9PPzgErkzMCbWahbDughXHOEvAwliNXVqyWx2lHSYG6oEeNWEQuOoY6J19xVicyyKbJF8MCUM%2BSxNp68cSLGa7mZw879mo2%2FCbv%2FmoQtKK8JcIiOXQehjrDTsv7GPUOjwptmvVaCFAnaD3IRvt%2FoPTeCXEFqAJ5hvvEZyhHcC5FDprZQkx4dm5crxihkHpDBTcZXD%2F%2BbQRzzzogKNHx09P%2Fw3kk0fGayo8RV986tSBH3z3OUytZWG58GoIs3SS703rxksijR%2BjTw%3D%3D
2018-10-31 09:14:10,126 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000027C42FDCBA8>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没 有正确答复或连接的主机没有反应,连接尝试失败。',)': /kaggle-competitions-data/kaggle/9988/128738/train_v2.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1541207587&Signature=D21eqFvKlCOlxBoWhAu2l9LTnwe4vWs1TZiFNHcAEGzSpGTJfSzBURa4f7YNwHGm3I2nbKpqvq9PPzgErkzMCbWahbDughXHOEvAwliNXVqyWx2lHSYG6oEeNWEQuOoY6J19xVicyyKbJF8MCUM%2BSxNp68cSLGa7mZw879mo2%2FCbv%2FmoQtKK8JcIiOXQehjrDTsv7GPUOjwptmvVaCFAnaD3IRvt%2FoPTeCXEFqAJ5hvvEZyhHcC5FDprZQkx4dm5crxihkHpDBTcZXD%2F%2BbQRzzzogKNHx09P%2Fw3kk0fGayo8RV986tSBH3z3OUytZWG58GoIs3SS703rxksijR%2BjTw%3D%3D
Traceback (most recent call last):
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\util\connection.py", line 83, in create_connection
    raise err
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\util\connection.py", line 73, in create_connection
    sock.connect(sa)
TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connection.py", line 284, in connect
    conn = self._new_conn()
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x0000027C42FDCCF8>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\softwares\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\softwares\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Softwares\anaconda3\Scripts\kaggle.exe\__main__.py", line 9, in <module>
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\cli.py", line 50, in main
    out = args.func(**command_args)
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\api\kaggle_api_extended.py", line 680, in competition_download_cli
    force, quiet)
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\api\kaggle_api_extended.py", line 617, in competition_download_file
    id=competition, file_name=file_name, _preload_content=False))
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\api\kaggle_api.py", line 325, in competitions_data_download_file_with_http_info
    collection_formats=collection_formats)
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\api_client.py", line 334, in call_api
    _preload_content, _request_timeout)
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\api_client.py", line 165, in __call_api
    _request_timeout=_request_timeout)
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\api_client.py", line 355, in request
    headers=headers)
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\rest.py", line 251, in GET
    query_params=query_params)
  File "d:\softwares\anaconda3\lib\site-packages\kaggle\rest.py", line 224, in request
    headers=headers)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\request.py", line 66, in request
    **urlopen_kw)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\request.py", line 87, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\poolmanager.py", line 349, in urlopen
    return self.urlopen(method, redirect_location, **kw)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\poolmanager.py", line 321, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 668, in urlopen
    **response_kw)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 668, in urlopen
    **response_kw)
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "d:\softwares\anaconda3\lib\site-packages\urllib3\util\retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /kaggle-competitions-data/kaggle/9988/128738/train_v2.zip?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1541207587&Signature=D21eqFvKlCOlxBoWhAu2l9LTnwe4vWs1TZiFNHcAEGzSpGTJfSzBURa4f7YNwHGm3I2nbKpqvq9PPzgErkzMCbWahbDughXHOEvAwliNXVqyWx2lHSYG6oEeNWEQuOoY6J19xVicyyKbJF8MCUM%2BSxNp68cSLGa7mZw879mo2%2FCbv%2FmoQtKK8JcIiOXQehjrDTsv7GPUOjwptmvVaCFAnaD3IRvt%2FoPTeCXEFqAJ5hvvEZyhHcC5FDprZQkx4dm5crxihkHpDBTcZXD%2F%2BbQRzzzogKNHx09P%2Fw3kk0fGayo8RV986tSBH3z3OUytZWG58GoIs3SS703rxksijR%2BjTw%3D%3D (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000027C42FDCCF8>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。',))

This Chinese sentence, "由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败", means connection attempt failed because the connecting party did not reply correctly after a period of time or the connected host did not respond.

image

It seems like I'm able to download the problematic files from the website. I don't think it's the files' problem since downloading process doesn't stuck at the same moment. Sometimes it can download some files but sometimes it can't download any file.

Sometimes I even can't access the kaggle website,maybe it's the same reason. But when I can't download dataset by kaggle-api, I can access the kaggle website.

db12138 commented 5 years ago

楼主解决没有啊 我也有这个问题 broken by 'ProtocolError

call-me-HOU-GE commented 5 years ago

我也碰到了这个问题,看日志应该是host='storage.googleapis.com'连接不上,原因大家估计都懂,这个地址我找到了一个中国国内的镜像,地址是clmirror.storage.googleapis.com,先ping一下得到镜像的ip,然后设置一下host文件把storage.googleapis.com映射到镜像的ip上,就可以了。

I got this problem too.As shown in LOG,it may be caused by failed in connection with host='storage.googleapis.com'.The real reason is known to us all.I found a mirror of this host accessible in china which is "clmirror.storage.googleapis.com".You can fix this problem by modifying your host file that linking “storage.googleapis.com“ to "clmirror.storage.googleapis.com"'s ip.

HYBB-rash commented 4 years ago

我也碰到了这个问题,看日志应该是host='storage.googleapis.com'连接不上,原因大家估计都懂,这个地址我找到了一个中国国内的镜像,地址是clmirror.storage.googleapis.com,先ping一下得到镜像的ip,然后设置一下host文件把storage.googleapis.com映射到镜像的ip上,就可以了。

I got this problem too.As shown in LOG,it may be caused by failed in connection with host='storage.googleapis.com'.The real reason is known to us all.I found a mirror of this host accessible in china which is "clmirror.storage.googleapis.com".You can fix this problem by modifying your host file that linking “storage.googleapis.com“ to "clmirror.storage.googleapis.com"'s ip.

感谢,解决了