Open inglesp opened 4 years ago
This has nothing to do with deploys -- I have run ./manage.py check_numbers
on largeweb2 twice with no deploys in progress, and Selenium timed out both times.
I have also run it from my own laptop, and it hasn't timed out.
So is something funny happening on largeweb2?
Nothing to do with the numbers checker. Selenium times out when making requests to eg google.com.
>>> import logging
>>>
>>> from selenium import webdriver
>>> from selenium.webdriver.firefox.options import Options
>>> from selenium.webdriver.remote.remote_connection import LOGGER as selenium_logger
>>>
>>> ch = logging.StreamHandler()
>>> ch.setLevel(logging.DEBUG)
>>>
>>> selenium_logger.setLevel(logging.DEBUG)
>>> selenium_logger.addHandler(ch)
>>>
>>> options = Options()
>>> options.headless = True >>> with webdriver.Firefox(options=options) as browser:
... browser.set_page_load_timeout(60) ... browser.get("https://google.com/")
...
POST http://127.0.0.1:47137/session {"desiredCapabilities": {"acceptInsecureCerts": true, "moz:firefoxOptions": {"args": ["-headless"]}, "browserName": "firefox", "marionette": true}, "capabilities": {"alwaysMatch": {"acceptInsecureCerts": true, "moz:firefoxOptions": {"args": ["-headless"]}, "browserName": "firefox"}, "firstMatch": [{}]}}
Finished Request
POST http://127.0.0.1:47137/session/c63221a7-7d41-4ffe-98eb-6817df555659/timeouts {"pageLoad": 60000}
Finished Request
POST http://127.0.0.1:47137/session/c63221a7-7d41-4ffe-98eb-6817df555659/url {"url": "https://google.com/"}
Finished Request
DELETE http://127.0.0.1:47137/session/c63221a7-7d41-4ffe-98eb-6817df555659 {}
Finished Request
Traceback (most recent call last):
File "/usr/lib/python3.5/code.py", line 91, in runcode
exec(code, self.locals)
File "<console>", line 3, in <module>
File "/webapps/openprescribing/.venv35/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/webapps/openprescribing/.venv35/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/webapps/openprescribing/.venv35/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Timeout loading page after 60000ms
We first saw this on 11/11/19.
It is not the case that largeweb2 cannot make outbound HTTP connections:
>>> import requests
>>> requests.get('https://openprescribing.net')
<Response [200]>
Here's what changed in the deploy when we first noticed the problem: https://github.com/ebmdatalab/openprescribing/compare/6d242180b482e7773688b43347eedd15da627f55...4cf60cbb83d92d9c4ff732eace714953f028361b.
Nothing stands out.
The fact that there are __unicode__
to __str__
changes in there suggest that it's around the time of the Python 3 switch. I wonder if that has any bearing.
The fact that there are
__unicode__
to__str__
changes in there suggest that it's around the time of the Python 3 switch. I wonder if that has any bearing.
Don't think so -- this change happened weeks after the main Python 3 switch. And in any case, we know that the numbers checker was running under Python 3, because it caught #2106.
What exactly is selenium waiting for, which doesn't happen? It appears to await "document readiness", i.e. for the parser to be no longer active - or (from the spec) "if there is an ongoing attempt to navigate the current browsing context that has not yet matured, wait for navigation to mature".
This mentions that there are different strategies for awaiting page load - not that I think the other two possibilities sound useful.
It also suggests that hanging dependencies (e.g. external scripts) can cause errors like we're seeing here.
I wonder if it's actually the browser driver that is at fault - can it be upgraded and/or switched?
@inglesp can this be closed?
After most deploys,
check_numbers
raisesRuntimeError: Timed out requesting [path]
for somepath
, which is different each time.This happens when
browser.get(url)
times out after 60s.Most recently, the timeout was requesting
15N/
. (Sentry.)The NewRelic histogram of response times for the period including the last deploy shows no request to
frontend.views.views:ccg_home_page
taking more than 2.5s.