DCOR-dev / DCOR-Aid

GUI for managing data on DCOR
https://dcor.mpl.mpg.de
GNU General Public License v3.0
1 stars 2 forks source link

Connection to DCOR server failed, cli throws error and doesn't finish upload job when connection is restored #66

Closed B-Hartmann closed 1 year ago

B-Hartmann commented 1 year ago

Windows 10 Python 3.10.8 dcoraid 0.11.9

I used a custom Python script that makes use of the dcoraid cli to upload data to a DCOR instance. It seems that the connection to the server was interrupted shortly, causing dcoraid to throw an error. It didn't finish the upload job, but my script went on to the next file, which was successfully uploaded. So in this situation, I ended up with a draft dataset on DCOR and a not finished job in the cli, which can be problematic if you don't check the full output of the cmd window again when the process finished. It's easy to overlook issues like that and then you lack some data.

Since it is common (at least for private house holds, at least in Germany) that the internet connection is interrupted shortly every day because the internet service providers assign new ip addresses (IPv4 address shortage), would it make sense to set the process to sleep for 30 seconds in case a "TimeoutError" occurs and then retry to upload or something like that?

Traceback (most recent call last):
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1374, in getresponse
    response.begin()
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1130, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\adapters.py", line 489, in send
    resp = conn.urlopen(
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\util\retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\packages\six.py", line 770, in reraise
    raise value
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 451, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 340, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='dcor-colab.mpl.mpg.de', port=443): Read timed out. (read timeout=27.9)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\dcoraid\api\dataset.py", line 205, in resource_add
    api.post("package_revise",
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\dcoraid\api\ckan_api.py", line 312, in post
    req = requests.post(url_call,
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\adapters.py", line 578, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='dcor-colab.mpl.mpg.de', port=443): Read timed out. (read timeout=27.9)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\util\connection.py", line 72, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connection.py", line 358, in connect
    self.sock = conn = self._new_conn()
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x0000016ADF837A00>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\adapters.py", line 489, in send
    resp = conn.urlopen(
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\urllib3\util\retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='dcor-colab.mpl.mpg.de', port=443): Max retries exceeded with url: /api/3/action/package_show?id=3fd49ce1-63bc-48fc-8303-7148fa4f4ecf (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000016ADF837A00>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\dcoraid\cli.py", line 77, in upload_task
    uj.task_upload_resources()
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\dcoraid\upload\job.py", line 424, in task_upload_resources
    srv_time = resource_add(
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\dcoraid\api\dataset.py", line 218, in resource_add
    if resource_exists(dataset_id=dataset_id,
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\dcoraid\api\dataset.py", line 264, in resource_exists
    pkg_dict = api.get("package_show", id=dataset_id)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\dcoraid\api\ckan_api.py", line 207, in get
    req = requests.get(self.api_url + api_call,
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\username\Documents\BDA\dcor_uploads\venv\lib\site-packages\requests\adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='dcor-colab.mpl.mpg.de', port=443): Max retries exceeded with url: /api/3/action/package_show?id=3fd49ce1-63bc-48fc-8303-7148fa4f4ecf (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000016ADF837A00>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
paulmueller commented 1 year ago

Yes, this makes total sense. Servers also have to reboot at times (overnight, which is when DCOR uploads are running as well), so this problem will haunt us indefinitely.

paulmueller commented 1 year ago

FYI If you are executing dcor-aid CLI via an external script/program, then DCOR-Aid will return a non-zero exit code if the upload fails. You can use this information to keep track of all failed uploads and e.g. show that list to the user.

@B-Hartmann