ProtoLife / daptics-api

API documentation and clients for the daptics.ai design of experiments engine
https://daptics.ai
GNU General Public License v3.0
1 stars 1 forks source link

timeout way too often. #43

Open nhpackard opened 2 years ago

nhpackard commented 2 years ago

Most often, timeout has been happening on calls to daptics.generate_analytics() (as in example error message below). Has also been seen for daptics.put_experiments_csv() as well.

tracerout could be troublesome:

$ traceroute inertia.daptics.ai
traceroute to inertia.daptics.ai (142.254.64.34), 64 hops max, 52 byte packets
 1  192.168.68.1 (192.168.68.1)  2.600 ms  1.291 ms  1.042 ms
 2  pppoe-server.net.ngi.it (81.174.0.21)  8.761 ms  9.667 ms  10.860 ms
 3  10.222.67.234 (10.222.67.234)  19.655 ms  19.773 ms  19.894 ms
 4  10.40.83.121 (10.40.83.121)  11.396 ms  16.729 ms  10.399 ms
 5  10.40.84.134 (10.40.84.134)  12.910 ms  37.984 ms  11.822 ms
 6  et-1-0-19.edge1.milan1.level3.net (213.249.124.141)  11.434 ms  13.086 ms  21.623 ms
 7  gtt-level3-milan1.level3.net (4.68.39.134)  14.996 ms  9.441 ms  10.862 ms
 8  ae9.cr0-pao1.ip4.gtt.net (89.149.128.238)  177.527 ms *  189.262 ms
 9  as7065.xe-1-0-6.ar1.pao1.us.as4436.gtt.net (69.22.130.86)  172.590 ms  168.864 ms  166.638 ms
10  102.ae1.cr1.pao1.sonic.net (70.36.205.5)  174.767 ms  178.644 ms  173.231 ms
11  0.ae0.cr1.colaca01.sonic.net (70.36.205.62)  171.088 ms  191.917 ms  181.782 ms
12  0.ae0.cr1.snrfca01.sonic.net (157.131.209.82)  178.734 ms
    0.ae2.cr2.colaca01.sonic.net (157.131.209.66)  187.424 ms  189.238 ms
13  0.xe-1-3-0.gw3.snfcca01.sonic.net (142.254.59.26)  178.881 ms
    0.ae2.cr2.snrfca01.sonic.net (157.131.209.170)  189.830 ms
    0.xe-1-3-0.gw3.snfcca01.sonic.net (142.254.59.26)  185.560 ms
14  * 0.xe-1-3-0.gw4.snfcca01.sonic.net (142.254.59.66)  175.978 ms
    0.xe-1-3-1.gw4.snfcca01.sonic.net (142.254.59.70)  173.378 ms
15  * * *
16  * * *
17  * * *
18  * * *
...
64  * * *

The timeout error:

---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/connection.py:169, in HTTPConnection._new_conn(self)
    168 try:
--> 169     conn = connection.create_connection(
    170         (self._dns_host, self.port), self.timeout, **extra_kw
    171     )
    173 except SocketTimeout:

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/util/connection.py:96, in create_connection(address, timeout, source_address, socket_options)
     95 if err is not None:
---> 96     raise err
     98 raise socket.error("getaddrinfo returns an empty list")

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/util/connection.py:86, in create_connection(address, timeout, source_address, socket_options)
     85     sock.bind(source_address)
---> 86 sock.connect(sa)
     87 return sock

TimeoutError: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/connectionpool.py:699, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    698 # Make the request on the httplib connection object.
--> 699 httplib_response = self._make_request(
    700     conn,
    701     method,
    702     url,
    703     timeout=timeout_obj,
    704     body=body,
    705     headers=headers,
    706     chunked=chunked,
    707 )
    709 # If we're going to release the connection in ``finally:``, then
    710 # the response doesn't need to know about the connection. Otherwise
    711 # it will also try to release it and we'll have a double-release
    712 # mess.

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/connectionpool.py:382, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    381 try:
--> 382     self._validate_conn(conn)
    383 except (SocketTimeout, BaseSSLError) as e:
    384     # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/connectionpool.py:1010, in HTTPSConnectionPool._validate_conn(self, conn)
   1009 if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
-> 1010     conn.connect()
   1012 if not conn.is_verified:

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/connection.py:353, in HTTPSConnection.connect(self)
    351 def connect(self):
    352     # Add certificate verification
--> 353     conn = self._new_conn()
    354     hostname = self.host

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/connection.py:181, in HTTPConnection._new_conn(self)
    180 except SocketError as e:
--> 181     raise NewConnectionError(
    182         self, "Failed to establish a new connection: %s" % e
    183     )
    185 return conn

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x10843e910>: Failed to establish a new connection: [Errno 60] Operation timed out

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/requests/adapters.py:439, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    438 if not chunked:
--> 439     resp = conn.urlopen(
    440         method=request.method,
    441         url=url,
    442         body=request.body,
    443         headers=request.headers,
    444         redirect=False,
    445         assert_same_host=False,
    446         preload_content=False,
    447         decode_content=False,
    448         retries=self.max_retries,
    449         timeout=timeout
    450     )
    452 # Send the request.
    453 else:

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/connectionpool.py:755, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    753     e = ProtocolError("Connection aborted.", e)
--> 755 retries = retries.increment(
    756     method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    757 )
    758 retries.sleep()

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/urllib3/util/retry.py:574, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
    573 if new_retry.is_exhausted():
--> 574     raise MaxRetryError(_pool, url, error or ResponseError(cause))
    576 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='api-files.daptics.ai', port=443): Max retries exceeded with url: /session/S97nfh5bmzf3m2jkzrf5/analytics/gen/3/PredRespProfile2D.pdf?token=QTEyOEdDTQ.fzRqKs95aMkSWaKnTUxdWrC0se_FGz9x62s8S-UpK60sATEBdhanXIoeXVE.iVb7DpOy7QI15LLf.EXN4HFyCgqGtXfa5BA4vzwmRwWxKxpqZ4v008HzJdGfu907KXQAzGRzIBZY1UgwlBlfaTX-8z6nqqznXDj8KeUGKbZaTaOTXLpPnYuBR7RSAfOUOejxEN5Tl7kn9kI13_xBXtzqTZCRqQNKijI-b3f6Y_HUToq-XPo8k7xjNn86x1Vv49DZYrzLKj-Ph9H7W7W5zx34725khBt5KQHBrjxFEM4mgm_xLng.UcB4xy2SIOQ5sR3QM6RxgA (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10843e910>: Failed to establish a new connection: [Errno 60] Operation timed out'))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
Input In [28], in <cell line: 10>()
      5 print('Generating analytics files.')
      7 # Generate any analytics files that are available for this generation.
      8 # Since the `auto_task_timeout` option has been set, the script will
      9 # block until the files are ready to be downloaded.
---> 10 daptics.generate_analytics()
     12 print('Downloading analytics files.')
     14 # Fetch the PDF analytics files via authenticated HTTP, and save them
     15 # to the './output' directory, where your automation workflow
     16 # software can pick them up.

File ~/Projects/daptics-api/python_client/daptics_client/daptics_client.py:2806, in DapticsClient.generate_analytics(self)
   2804 task_id = data['createAnalytics']['taskId']
   2805 self.task_info[task_id] = data['createAnalytics']
-> 2806 auto_task = self._auto_task()
   2807 if auto_task is not None:
   2808     return {'createAnalytics': auto_task}

File ~/Projects/daptics-api/python_client/daptics_client/daptics_client.py:2730, in DapticsClient._auto_task(self, timeout_override)
   2727 if timeout is None:
   2728     return None
-> 2730 data, errors = self.wait_for_current_task(
   2731     task_type=None, timeout=timeout)
   2732 self._raise_exception_on_error(data, errors)
   2734 return data['currentTask']

File ~/Projects/daptics-api/python_client/daptics_client/daptics_client.py:2680, in DapticsClient.wait_for_current_task(self, task_type, timeout)
   2678 retry = 0
   2679 while True:
-> 2680     data, errors = self.poll_for_current_task(task_type)
   2681     if data and 'currentTask' in data and data['currentTask'] is not None:
   2682         status = data['currentTask']['status']

File ~/Projects/daptics-api/python_client/daptics_client/daptics_client.py:2638, in DapticsClient.poll_for_current_task(self, task_type)
   2636                 self.analytics = result['analytics']
   2637                 if auto_export_path is not None:
-> 2638                     self.download_all_analytics_files(
   2639                         self.analytics, auto_export_path, True)
   2640 else:
   2641     data = {'currentTask': None}

File ~/Projects/daptics-api/python_client/daptics_client/daptics_client.py:2848, in DapticsClient.download_all_analytics_files(self, analytics, directory, name_by_gen)
   2846 if 'url' in file and 'filename' in file:
   2847     url, params = self.download_url_and_params(file['url'])
-> 2848     response = requests.get(url, params=params)
   2849     if response.status_code == requests.codes.ok and response.content is not None:
   2850         if file_count == 0:

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/requests/api.py:76, in get(url, params, **kwargs)
     65 r"""Sends a GET request.
     66 
     67 :param url: URL for the new :class:`Request` object.
   (...)
     72 :rtype: requests.Response
     73 """
     75 kwargs.setdefault('allow_redirects', True)
---> 76 return request('get', url, params=params, **kwargs)

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/requests/api.py:61, in request(method, url, **kwargs)
     57 # By using the 'with' statement we are sure the session is closed, thus we
     58 # avoid leaving sockets open which can trigger a ResourceWarning in some
     59 # cases, and look like a memory leak in others.
     60 with sessions.Session() as session:
---> 61     return session.request(method=method, url=url, **kwargs)

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/requests/sessions.py:542, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    537 send_kwargs = {
    538     'timeout': timeout,
    539     'allow_redirects': allow_redirects,
    540 }
    541 send_kwargs.update(settings)
--> 542 resp = self.send(prep, **send_kwargs)
    544 return resp

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/requests/sessions.py:655, in Session.send(self, request, **kwargs)
    652 start = preferred_clock()
    654 # Send the request
--> 655 r = adapter.send(request, **kwargs)
    657 # Total elapsed time of the request (approximately)
    658 elapsed = preferred_clock() - start

File ~/.pyenv/versions/3.9.5/lib/python3.9/site-packages/requests/adapters.py:516, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    512     if isinstance(e.reason, _SSLError):
    513         # This branch is for urllib3 v1.22 and later.
    514         raise SSLError(e, request=request)
--> 516     raise ConnectionError(e, request=request)
    518 except ClosedPoolError as e:
    519     raise ConnectionError(e, request=request)

ConnectionError: HTTPSConnectionPool(host='api-files.daptics.ai', port=443): Max retries exceeded with url: /session/S97nfh5bmzf3m2jkzrf5/analytics/gen/3/PredRespProfile2D.pdf?token=QTEyOEdDTQ.fzRqKs95aMkSWaKnTUxdWrC0se_FGz9x62s8S-UpK60sATEBdhanXIoeXVE.iVb7DpOy7QI15LLf.EXN4HFyCgqGtXfa5BA4vzwmRwWxKxpqZ4v008HzJdGfu907KXQAzGRzIBZY1UgwlBlfaTX-8z6nqqznXDj8KeUGKbZaTaOTXLpPnYuBR7RSAfOUOejxEN5Tl7kn9kI13_xBXtzqTZCRqQNKijI-b3f6Y_HUToq-XPo8k7xjNn86x1Vv49DZYrzLKj-Ph9H7W7W5zx34725khBt5KQHBrjxFEM4mgm_xLng.UcB4xy2SIOQ5sR3QM6RxgA (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10843e910>: Failed to establish a new connection: [Errno 60] Operation timed out'))
pzingg commented 2 years ago

I hope I have time next week to investigate.

  1. Maybe the analytics just takes too long--we can increase the timeout values in daptics_client.py or the backend for this.
  2. If I have 5 or 6 hours (?), I can try to go back to the Rserve implementation (instead of current Rscript). But I think that only reduces the time to load R, which is the same for all API session calls, not just analytics.