Selenium is timing out on largeweb2

inglesp commented 4 years ago

After most deploys, check_numbers raises RuntimeError: Timed out requesting [path] for some path, which is different each time.

This happens when browser.get(url) times out after 60s.

Most recently, the timeout was requesting 15N/. (Sentry.)

The NewRelic histogram of response times for the period including the last deploy shows no request to frontend.views.views:ccg_home_page taking more than 2.5s.

inglesp commented 4 years ago

This has nothing to do with deploys -- I have run ./manage.py check_numbers on largeweb2 twice with no deploys in progress, and Selenium timed out both times.

I have also run it from my own laptop, and it hasn't timed out.

So is something funny happening on largeweb2?

inglesp commented 4 years ago

Nothing to do with the numbers checker. Selenium times out when making requests to eg google.com.

>>> import logging
>>> 
>>> from selenium import webdriver
>>> from selenium.webdriver.firefox.options import Options
>>> from selenium.webdriver.remote.remote_connection import LOGGER as selenium_logger
>>> 
>>> ch = logging.StreamHandler()
>>> ch.setLevel(logging.DEBUG)
>>> 
>>> selenium_logger.setLevel(logging.DEBUG)
>>> selenium_logger.addHandler(ch)
>>> 
>>> options = Options()
>>> options.headless = True                                                                                                                                                                                                                                                     >>> with webdriver.Firefox(options=options) as browser:
...     browser.set_page_load_timeout(60)                                                                                                                                                                                                                                       ...     browser.get("https://google.com/")
... 
POST http://127.0.0.1:47137/session {"desiredCapabilities": {"acceptInsecureCerts": true, "moz:firefoxOptions": {"args": ["-headless"]}, "browserName": "firefox", "marionette": true}, "capabilities": {"alwaysMatch": {"acceptInsecureCerts": true, "moz:firefoxOptions": {"args": ["-headless"]}, "browserName": "firefox"}, "firstMatch": [{}]}}
Finished Request
POST http://127.0.0.1:47137/session/c63221a7-7d41-4ffe-98eb-6817df555659/timeouts {"pageLoad": 60000}
Finished Request
POST http://127.0.0.1:47137/session/c63221a7-7d41-4ffe-98eb-6817df555659/url {"url": "https://google.com/"}
Finished Request
DELETE http://127.0.0.1:47137/session/c63221a7-7d41-4ffe-98eb-6817df555659 {}
Finished Request
Traceback (most recent call last):
  File "/usr/lib/python3.5/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<console>", line 3, in <module>
  File "/webapps/openprescribing/.venv35/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/webapps/openprescribing/.venv35/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/webapps/openprescribing/.venv35/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: Timeout loading page after 60000ms

inglesp commented 4 years ago

We first saw this on 11/11/19.

inglesp commented 4 years ago

It is not the case that largeweb2 cannot make outbound HTTP connections:

>>> import requests
>>> requests.get('https://openprescribing.net')
<Response [200]>

inglesp commented 4 years ago

Here's what changed in the deploy when we first noticed the problem: https://github.com/ebmdatalab/openprescribing/compare/6d242180b482e7773688b43347eedd15da627f55...4cf60cbb83d92d9c4ff732eace714953f028361b.

Nothing stands out.

evansd commented 4 years ago

The fact that there are __unicode__ to __str__ changes in there suggest that it's around the time of the Python 3 switch. I wonder if that has any bearing.

inglesp commented 4 years ago

The fact that there are __unicode__ to __str__ changes in there suggest that it's around the time of the Python 3 switch. I wonder if that has any bearing.

Don't think so -- this change happened weeks after the main Python 3 switch. And in any case, we know that the numbers checker was running under Python 3, because it caught #2106.

sebbacon commented 4 years ago

What exactly is selenium waiting for, which doesn't happen? It appears to await "document readiness", i.e. for the parser to be no longer active - or (from the spec) "if there is an ongoing attempt to navigate the current browsing context that has not yet matured, wait for navigation to mature".

This mentions that there are different strategies for awaiting page load - not that I think the other two possibilities sound useful.

It also suggests that hanging dependencies (e.g. external scripts) can cause errors like we're seeing here.

I wonder if it's actually the browser driver that is at fault - can it be upgraded and/or switched?

richiecroker commented 1 year ago

@inglesp can this be closed?

ebmdatalab / openprescribing

Selenium is timing out on largeweb2 #2256