SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.64k stars 8.18k forks source link

[🐛 Bug]: Issue HTTPConnectionPool(host='localhost', port=63302): Max retries exceeded with url with Python multiprocessing #14627

Closed avalon60 closed 1 week ago

avalon60 commented 1 week ago

What happened?

I have a framework, which reads JSON that describes a set of tests to be executed. The JSON is interpreted by my program, apex_unit_tester.py. The tests are mapped out into streams to execute via the Python multiprocessing module. A function, process_test(), is called for each test, and this in turn instatiates a page testing object, PageExecutor, to test an application page, This object instantiates a webdriver and before it exits it performs a webdriver.quit().

The problem I am seeing is that if I run tests such that more than one test is executed in any one of the multiprocessing streams, the 1st test in the stream succeeds, but the subsequent tests hit an exception, always at the same point. Even with identical tests the first test goes OK, and the subsequent test fails. The degree of parallelism is set by a command line parameter. If I set it to 1 (i.e. force a single stream) the 1st test runs error free, yet any subsequent tests hit the same problem:

HTTPConnectionPool(host='localhost', port=56222): Max retries exceeded with url: /session/678c99e92516838db5278db9a2293b9b/element (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000193B8595630>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

As you can see, it appears that the Webdriver then carries on - it doesn't appear to have crashed. However. if I run the same tests such that only one test is allocated to each multiprocessing stream - they run without a glitch.

The code being executed, when the exception is raised, is the wait in this functiion:

def wait_for_element_by_id(driver, element_id, retries=3, delay=5, timeout=10):
    """
    Wait for an element to appear by its ID. Retry if connection fails.

    :param driver: WebDriver object
    :param element_id: str, ID of the element to locate
    :param retries: int, number of retries in case of connection errors
    :param delay: int, delay between retries in seconds
    :param timeout: int, maximum time to wait for the element
    :return: WebElement or None if not found
    """
    for attempt in range(retries):
        try:
            return WebDriverWait(driver, timeout).until(
                ec.presence_of_element_located((By.ID, element_id))
            )
        except TimeoutException:
            print(f"TimeoutException: Element with ID '{element_id}' not found within {timeout} seconds.")
        except WebDriverException as e:
            print(f"WebDriverException: {str(e)} - Retrying ({attempt+1}/{retries})")
            sleep(delay)  # wait before retrying
        except Exception as e:
            print(f"Unexpected exception: {str(e)}")
            return None
    print(f"Failed to find element with ID '{element_id}' after {retries} retries.")
    return None

I get the same results running this on Windows 11 and Linux Mint 21.3. I also get the same results using different application servers (one on one of my local machines and the other via work),

How can we reproduce the issue?

I'm not sure how this can be easily reporoduced. This is because I am testing APEX application pages. I have a Docker container which acts as a service that serves the pages for testing. APEX applications are stored in an Oracle database (inside the Docker container), and are served up via Oracle ORDS. The concept here is that we configure tests via a purposed APEX application, the APEX application allows pages from other APEX applications to be selected for testing. We declare actions to take place against selected widgets. Details are stored to the database. Another app acts as a launchpad. This app reads the configured tests as JSON and renders the JSON, where the apex_unit_tester.py reads th JSON and then uses Python multiprocessing to start testing the pages.

Relevant log output

0/10/2024 12:19:23 | CRITICAL | 280 | AW-21169 | 1901999 | 1 | Problem encountered on Popup LOV select_by_value action (action# 4).
20/10/2024 12:19:23 | ERROR    | 280 | AW-21169 | 1901999 | 1 | HTTPConnectionPool(host='localhost', port=63302): Max retries exceeded with url: /session/14be223f651b19f1078ffae85cee4d23/element (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000018BC5FC39B0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
20/10/2024 12:19:27 | INFO     | 280 | AW-21169 | 1901999 | 1 | Action [type: Page]: PAGE vertical_scroll 500  
20/10/2024 12:19:31 | INFO     | 280 | AW-21169 | 1901999 | 1 | Action [type: N/A]: Sleep 2 seconds
20/10/2024 12:19:37 | ASSERTION SUCCESS | 280 | AW-21169 | 1901999 | 1 |  Condition: P1_FIRST_NAME eq Bilbo
20/10/2024 12:19:37 | ASSERTION SUCCESS | 280 | AW-21169 | 1901999 | 1 |  Condition: P1_LAST_NAME contains Baggins

Multiple tests:

APEX Unit Tester Started at: 2024-10-20 13:55:10 +0100
Unit test run id: UT1729428910

================== Key Parameters =================
Environment: aut1
Headless: False
Parallel degree: 1
Tester's name: cbostock
Release tag: 2024.3
Test tag: None
Results directory: C:\temp\results
Test run directory: C:\temp\results\UT1729428910
===================================================

20/10/2024 13:55:19 | WARNING  | Config parameter allow_sleeps is enabled!
20/10/2024 13:55:19 | UNIT STARTED | 2 | XXX-101010 | 1901999 | 1 | Simple test - success
20/10/2024 13:55:26 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_FIRST_NAME set_value Samwise  
20/10/2024 13:55:30 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_LAST_NAME set_value Gamgee  
20/10/2024 13:55:34 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER search Au  
20/10/2024 13:55:43 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER select_by_value Aut  
20/10/2024 13:55:47 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Page]: PAGE vertical_scroll 700  
20/10/2024 13:55:51 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: N/A]: Sleep 2 seconds
20/10/2024 13:55:57 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Condition: P1_FIRST_NAME eq Samwise
20/10/2024 13:55:57 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Condition: P1_LAST_NAME contains Gamgee
20/10/2024 13:55:57 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Condition: R716104135023217338 gt_rows 5
20/10/2024 13:55:57 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Success rate: 3 out of 3 assertions passed.
20/10/2024 13:55:57 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Overall status: pass; predicated on "All assertions must pass"
20/10/2024 13:55:57 | PAGE TIMING | 2 | XXX-101010 | 1901999/1 | Simple test - success | 38212.13
20/10/2024 13:56:00 | UNIT COMPLETED | 2 | XXX-101010 | 1901999 | 1 | Simple test - success
20/10/2024 13:56:00 | UNIT STARTED | 3 | XXX-101010 | 1901999 | 1 | Simple test - success with OR
20/10/2024 13:56:07 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_FIRST_NAME set_value Bilbo  
20/10/2024 13:56:11 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_LAST_NAME set_value Baggins  
20/10/2024 13:56:15 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER search Au  
Unexpected exception: HTTPConnectionPool(host='localhost', port=56222): Max retries exceeded with url: /session/678c99e92516838db5278db9a2293b9b/element (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000193B851B280>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
'NoneType' object has no attribute 'find_element'
20/10/2024 13:56:35 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER select_by_value Audsys  
20/10/2024 13:56:51 | CRITICAL | 3 | XXX-101010 | 1901999 | 1 | Problem encountered on Popup LOV select_by_value action (action# 4).
20/10/2024 13:56:51 | ERROR    | 3 | XXX-101010 | 1901999 | 1 | HTTPConnectionPool(host='localhost', port=56222): Max retries exceeded with url: /session/678c99e92516838db5278db9a2293b9b/element (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000193B851B0A0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
20/10/2024 13:56:56 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Page]: PAGE vertical_scroll 700  
20/10/2024 13:57:00 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: N/A]: Sleep 2 seconds
20/10/2024 13:57:06 | ASSERTION SUCCESS | 3 | XXX-101010 | 1901999 | 1 |  Condition: P1_FIRST_NAME eq Bilbo
20/10/2024 13:57:06 | ASSERTION FAIL | 3 | XXX-101010 | 1901999 | 1 |  Condition: R716104135023217338 eq_rows 5
20/10/2024 13:57:06 | ASSERTION SUCCESS | 3 | XXX-101010 | 1901999 | 1 |  Success rate: 1 out of 2 assertions passed.
20/10/2024 13:57:06 | ASSERTION SUCCESS | 3 | XXX-101010 | 1901999 | 1 |  Overall status: pass; predicated on "At least one assertion must pass"
20/10/2024 13:57:06 | PAGE TIMING | 3 | XXX-101010 | 1901999/1 | Simple test - success with OR | 65743.01
20/10/2024 13:57:08 | UNIT COMPLETED | 3 | XXX-101010 | 1901999 | 1 | Simple test - success with OR
20/10/2024 13:57:08 | UNIT STARTED | 1 | XXX-101010 | 1901999 | 1 | Simple test
20/10/2024 13:57:15 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_FIRST_NAME set_value Bilbo  
20/10/2024 13:57:19 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_LAST_NAME set_value Baggins  
20/10/2024 13:57:23 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER search Au  
Unexpected exception: HTTPConnectionPool(host='localhost', port=56222): Max retries exceeded with url: /session/678c99e92516838db5278db9a2293b9b/element (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000193B8595630>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
'NoneType' object has no attribute 'find_element'
20/10/2024 13:57:44 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER select_by_value Audsys  
20/10/2024 13:58:00 | CRITICAL | 1 | XXX-101010 | 1901999 | 1 | Problem encountered on Popup LOV select_by_value action (action# 4).
20/10/2024 13:58:00 | ERROR    | 1 | XXX-101010 | 1901999 | 1 | HTTPConnectionPool(host='localhost', port=56222): Max retries exceeded with url: /session/678c99e92516838db5278db9a2293b9b/element (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000193B8594490>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
20/10/2024 13:58:04 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Page]: PAGE vertical_scroll 700  
20/10/2024 13:58:08 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: N/A]: Sleep 2 seconds
20/10/2024 13:58:14 | ASSERTION SUCCESS | 1 | XXX-101010 | 1901999 | 1 |  Condition: P1_FIRST_NAME eq Bilbo
20/10/2024 13:58:14 | ASSERTION FAIL | 1 | XXX-101010 | 1901999 | 1 |  Condition: R716104135023217338 eq_rows 5
20/10/2024 13:58:14 | ASSERTION FAIL | 1 | XXX-101010 | 1901999 | 1 |  Success rate: 1 out of 2 assertions passed.
20/10/2024 13:58:14 | ASSERTION FAIL | 1 | XXX-101010 | 1901999 | 1 |  Overall status: fail; predicated on "All assertions must pass"
20/10/2024 13:58:15 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Screenshot taken: 2024-10-20_135814_1_1901999_1.png
20/10/2024 13:58:15 | PAGE TIMING | 1 | XXX-101010 | 1901999/1 | Simple test | 65896.78
20/10/2024 13:58:17 | UNIT COMPLETED | 1 | XXX-101010 | 1901999 | 1 | Simple test
Sub-task Logs have been concatenated to: C:\temp\results\UT1729428910\logs\UT1729428910.log
APEX Unit Tester completed at: 2024-10-20 13:58:17 +0100

Same 3 tests, but with parallel degree 3:

APEX Unit Tester Started at: 2024-10-20 15:01:37 +0100
Unit test run id: UT1729432897

================== Key Parameters =================
Environment: aut1
Headless: False
Parallel degree: 3
Tester's name: cbostock
Release tag: 2024.3
Test tag: None
Results directory: C:\temp\results
Test run directory: C:\temp\results\UT1729432897
===================================================

20/10/2024 15:01:47 | WARNING  | Config parameter allow_sleeps is enabled!
20/10/2024 15:01:47 | WARNING  | Config parameter allow_sleeps is enabled!
20/10/2024 15:01:47 | WARNING  | Config parameter allow_sleeps is enabled!
20/10/2024 15:01:47 | UNIT STARTED | 1 | XXX-101010 | 1901999 | 1 | Simple test
20/10/2024 15:01:47 | UNIT STARTED | 2 | XXX-101010 | 1901999 | 1 | Simple test - success
20/10/2024 15:01:47 | UNIT STARTED | 3 | XXX-101010 | 1901999 | 1 | Simple test - success with OR
20/10/2024 15:01:54 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_FIRST_NAME set_value Bilbo  
20/10/2024 15:01:54 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_FIRST_NAME set_value Samwise  
20/10/2024 15:01:54 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_FIRST_NAME set_value Bilbo  
20/10/2024 15:01:58 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_LAST_NAME set_value Baggins  
20/10/2024 15:01:58 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_LAST_NAME set_value Gamgee  
20/10/2024 15:01:58 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Text Field]: P1_LAST_NAME set_value Baggins  
20/10/2024 15:02:02 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER search Au  
20/10/2024 15:02:02 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER search Au  
20/10/2024 15:02:02 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER search Au  
20/10/2024 15:02:11 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER select_by_value Audsys  
20/10/2024 15:02:11 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER select_by_value Aut  
20/10/2024 15:02:12 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Popup LOV]: P1_OWNER select_by_value Audsys  
20/10/2024 15:02:15 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: Page]: PAGE vertical_scroll 700  
20/10/2024 15:02:15 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: Page]: PAGE vertical_scroll 700  
20/10/2024 15:02:16 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: Page]: PAGE vertical_scroll 700  
20/10/2024 15:02:19 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Action [type: N/A]: Sleep 2 seconds
20/10/2024 15:02:19 | INFO     | 2 | XXX-101010 | 1901999 | 1 | Action [type: N/A]: Sleep 2 seconds
20/10/2024 15:02:20 | INFO     | 3 | XXX-101010 | 1901999 | 1 | Action [type: N/A]: Sleep 2 seconds
20/10/2024 15:02:25 | ASSERTION SUCCESS | 1 | XXX-101010 | 1901999 | 1 |  Condition: P1_FIRST_NAME eq Bilbo
20/10/2024 15:02:25 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Condition: P1_FIRST_NAME eq Samwise
20/10/2024 15:02:26 | ASSERTION FAIL | 1 | XXX-101010 | 1901999 | 1 |  Condition: R716104135023217338 eq_rows 5
20/10/2024 15:02:26 | ASSERTION FAIL | 1 | XXX-101010 | 1901999 | 1 |  Success rate: 1 out of 2 assertions passed.
20/10/2024 15:02:26 | ASSERTION FAIL | 1 | XXX-101010 | 1901999 | 1 |  Overall status: fail; predicated on "All assertions must pass"
20/10/2024 15:02:26 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Condition: P1_LAST_NAME contains Gamgee
20/10/2024 15:02:26 | INFO     | 1 | XXX-101010 | 1901999 | 1 | Screenshot taken: 2024-10-20_150226_1_1901999_1.png
20/10/2024 15:02:26 | PAGE TIMING | 1 | XXX-101010 | 1901999/1 | Simple test | 38661.48
20/10/2024 15:02:26 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Condition: R716104135023217338 gt_rows 5
20/10/2024 15:02:26 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Success rate: 3 out of 3 assertions passed.
20/10/2024 15:02:26 | ASSERTION SUCCESS | 2 | XXX-101010 | 1901999 | 1 |  Overall status: pass; predicated on "All assertions must pass"
20/10/2024 15:02:26 | PAGE TIMING | 2 | XXX-101010 | 1901999/1 | Simple test - success | 38787.37
20/10/2024 15:02:26 | ASSERTION SUCCESS | 3 | XXX-101010 | 1901999 | 1 |  Condition: P1_FIRST_NAME eq Bilbo
20/10/2024 15:02:26 | ASSERTION FAIL | 3 | XXX-101010 | 1901999 | 1 |  Condition: R716104135023217338 eq_rows 5
20/10/2024 15:02:26 | ASSERTION SUCCESS | 3 | XXX-101010 | 1901999 | 1 |  Success rate: 1 out of 2 assertions passed.
20/10/2024 15:02:26 | ASSERTION SUCCESS | 3 | XXX-101010 | 1901999 | 1 |  Overall status: pass; predicated on "At least one assertion must pass"
20/10/2024 15:02:26 | PAGE TIMING | 3 | XXX-101010 | 1901999/1 | Simple test - success with OR | 39290.69
20/10/2024 15:02:28 | UNIT COMPLETED | 1 | XXX-101010 | 1901999 | 1 | Simple test
20/10/2024 15:02:28 | UNIT COMPLETED | 2 | XXX-101010 | 1901999 | 1 | Simple test - success
20/10/2024 15:02:29 | UNIT COMPLETED | 3 | XXX-101010 | 1901999 | 1 | Simple test - success with OR
Sub-task Logs have been concatenated to: C:\temp\results\UT1729432897\logs\UT1729432897.log
APEX Unit Tester completed at: 2024-10-20 15:02:29 +0100

Operating System

Windows 11 and Linux Mint 21.3 (Ubuntu-ish)

Selenium version

Python 3.12.2 (Windows 11); Python 3.10.12 (Linux Mint); Seleium: Reproduced with 4.12 and 4.25

What are the browser(s) and version(s) where you see this issue?

Chrome => Windows: Version 129.0.6668.101 (Official Build) (64-bit) Linux: Version 124.0.6367.201 (Official Build) (64-bit)

What are the browser driver(s) and version(s) where you see this issue?

Windows: 129.0.6668.100 / Linux: ChromeDriver 124.0.6367.207

Are you using Selenium Grid?

No

github-actions[bot] commented 1 week ago

@avalon60, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

avalon60 commented 1 week ago

I think I may have found the issue. I was caching dom elements and I have a feeling that I messed someting up. Apologies for any inconvenience.