[🚀 Feature]: Expose browser specific functionalities for remote webdrivers in Python

j3soon commented 1 year ago

Feature and motivation

As mentioned in saucelabs/sauce-docs#1621, workarounds are required to use browser specific functionalities in remote webdrivers. The current workaround for Python is much shorter than those in C#, but much longer than using Augmenter in Java.

I have noticed that the workarounds can be greatly simplified with some minor modifications in the Selenium codebase. Since Python is dynamically typed, implementing this feature in Python is much easier than in Java/C#. If the potential modifications are appropriate, I will open a PR.

The following are workarounds for four browser specific functionalities:

Network Conditions
Full-Page Screenshots
Install and Uninstall Add-ons
Change Preferences During Session

Network Conditions

The entries for network conditions are set during the initialization of ChromiumRemoteConnection, which aren't set for remote webdrivers. Therefore, we need to set the three entries manually.

set_network_conditions uses _commands["setNetworkConditions"],
get_network_conditions uses _commands["getNetworkConditions"], and
delete_network_conditions uses _commands["deleteNetworkConditions"].
See the source code for further info.

from selenium import webdriver

driver = webdriver.Remote(
    command_executor='http://localhost:4444/wd/hub',
    options=webdriver.ChromeOptions(),
)

driver.command_executor._commands["setNetworkConditions"] = ("POST", "/session/$sessionId/chromium/network_conditions")
driver.command_executor._commands["getNetworkConditions"] = ("GET", "/session/$sessionId/chromium/network_conditions")
driver.command_executor._commands["deleteNetworkConditions"] = ("DELETE", "/session/$sessionId/chromium/network_conditions")

webdriver.Chrome.set_network_conditions(
    driver,
    offline=False,
    latency=5,  # additional latency (ms)
    download_throughput=500 * 1024,  # maximal throughput
    upload_throughput=500 * 1024,  # maximal throughput
)
ret = webdriver.Chrome.get_network_conditions(driver)
print(ret)
webdriver.Chrome.delete_network_conditions(driver)

Potential modifications: If we modify the selenium codebase to set the entries in the beginning of webdriver.Chrome.*_network_conditions instead of during the initialization of ChromiumRemoteConnection, the entries don't need to be set manually.

Full-Page Screenshots

The simple workaround works for:

get_full_page_screenshot_as_base64, since it simply calls self.execute("FULL_PAGE_SCREENSHOT").
See the source code for further info.

But we cannot call:

get_full_page_screenshot_as_file,
save_full_page_screenshot, and
get_full_page_screenshot_as_png

directly, due to their use of self.get_full_page_screenshot*.

However, we can simply copy & re-write the functions above by modifying the calls to self.get_full_page_screenshot*.

import base64
import warnings
from selenium import webdriver

def get_full_page_screenshot_as_file(driver, filename) -> bool:
    """
    Saves a full document screenshot of the current window to a PNG image file. Returns
        False if there is any IOError, else returns True. Use full paths in
        your filename.

    :Args:
        - filename: The full path you wish to save your screenshot to. This
        should end with a `.png` extension.

    :Usage:
        ::

            get_full_page_screenshot_as_file(driver, '/Screenshots/foo.png')
    """
    if not filename.lower().endswith(".png"):
        warnings.warn(
            "name used for saved screenshot does not match file " "type. It should end with a `.png` extension",
            UserWarning,
        )
    png = get_full_page_screenshot_as_png(driver)
    try:
        with open(filename, "wb") as f:
            f.write(png)
    except OSError:
        return False
    finally:
        del png
    return True

def save_full_page_screenshot(driver, filename) -> bool:
    """
    Saves a full document screenshot of the current window to a PNG image file. Returns
        False if there is any IOError, else returns True. Use full paths in
        your filename.

    :Args:
        - filename: The full path you wish to save your screenshot to. This
        should end with a `.png` extension.

    :Usage:
        ::

            save_full_page_screenshot(driver, '/Screenshots/foo.png')
    """
    return get_full_page_screenshot_as_file(driver, filename)

def get_full_page_screenshot_as_png(driver) -> bytes:
    """
    Gets the full document screenshot of the current window as a binary data.

    :Usage:
        ::

            get_full_page_screenshot_as_png(driver)
    """
    return base64.b64decode(webdriver.Firefox.get_full_page_screenshot_as_base64(driver).encode("ascii"))

driver = webdriver.Remote(
    command_executor='http://localhost:4444/wd/hub',
    options=webdriver.FirefoxOptions(),
)

b64 = webdriver.Firefox.get_full_page_screenshot_as_base64(driver)
print('get_full_page_screenshot_as_base64', b64)
png = get_full_page_screenshot_as_png(driver)
print('get_full_page_screenshot_as_png', png)
ret = get_full_page_screenshot_as_file(driver, 'screenshot1.png')
print('get_full_page_screenshot_as_file', ret)
ret = save_full_page_screenshot(driver, 'screenshot2.png')
print('save_full_page_screenshot', ret)

Potential modifications: Change all calls to self.get_full_page_screenshot*(...) into webdriver.Firefox.get_full_page_screenshot*(driver, ...).

Install and Uninstall Add-ons

The simple workaround works since:

install_addon base64 encodes the zipped add-on and calls self.execute("INSTALL_ADDON", ...), while
uninstall_addon simply calls self.execute("UNINSTALL_ADDON", ...).
See the source code for further info.

from selenium import webdriver

driver = webdriver.Remote(
    command_executor='http://localhost:4444/wd/hub',
    options=webdriver.FirefoxOptions(),
)

addon_id = webdriver.Firefox.install_addon(driver, "resources/ninja_saucebot-1.0-an+fx.xpi")
webdriver.Firefox.uninstall_addon(driver, addon_id)

Potential modifications: (No modification required)

Change Preferences During Session

The simple workaround works for:

set_context since it simply calls self.execute("SET_CONTEXT", ...),

but requires some copy & re-write for:

context, due to the use of self.set_context.
See the source code for further info.

from contextlib import contextmanager
from selenium import webdriver

@contextmanager
def context(driver, context):
    """Sets the context that Selenium commands are running in using
    a `with` statement. The state of the context on the server is
    saved before entering the block, and restored upon exiting it.

    :param context: Context, may be one of the class properties
        `CONTEXT_CHROME` or `CONTEXT_CONTENT`.

    Usage example::

        with selenium.context(selenium.CONTEXT_CHROME):
            # chrome scope
            ... do stuff ...
    """
    initial_context = driver.execute("GET_CONTEXT").pop("value")
    webdriver.Firefox.set_context(driver, context)
    try:
        yield
    finally:
        webdriver.Firefox.set_context(driver, initial_context)

driver = webdriver.Remote(
    command_executor='http://localhost:4444/wd/hub',
    options=webdriver.FirefoxOptions(),
)

webdriver.Firefox.set_context(driver, webdriver.Firefox.CONTEXT_CHROME)
# chrome scope
webdriver.Firefox.set_context(driver, webdriver.Firefox.CONTEXT_CONTENT)

with context(driver, webdriver.Firefox.CONTEXT_CHROME):
    # chrome scope
    pass

Potential modifications: Change all calls to self.set_context(...) into webdriver.Firefox.set_context(driver, ...).

Usage example

After the potential modifications are applied, we can easily use all four browser specific functionalities in Python with just a few lines of code:

Network Conditions

webdriver.Chrome.set_network_conditions(driver, ...)
ret = webdriver.Chrome.get_network_conditions(driver)
webdriver.Chrome.delete_network_conditions(driver)

Full-Page Screenshots

b64 = webdriver.Firefox.get_full_page_screenshot_as_base64(driver)
png = webdriver.Firefox.get_full_page_screenshot_as_png(driver)
ret = webdriver.Firefox.get_full_page_screenshot_as_file(driver, 'screenshot1.png')
ret = webdriver.Firefox.save_full_page_screenshot(driver, 'screenshot2.png')

Install and Uninstall Add-ons

addon_id = webdriver.Firefox.install_addon(driver, "addon.xpi")
webdriver.Firefox.uninstall_addon(driver, addon_id)

Change Preferences During Session

webdriver.Firefox.set_context(driver, webdriver.Firefox.CONTEXT_CHROME)
# chrome scope
webdriver.Firefox.set_context(driver, webdriver.Firefox.CONTEXT_CONTENT)
with webdriver.Firefox.context(driver, webdriver.Firefox.CONTEXT_CHROME):
  # chrome scope
  pass

github-actions[bot] commented 1 year ago

@j3soon, thank you for creating this issue. We will troubleshoot it as soon as we can.

Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

titusfortner commented 1 year ago

Yes, please open a PR. We really appreciate your help with this!

github-actions[bot] commented 1 year ago

This issue is looking for contributors.

Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested.

symonk commented 1 year ago

Without thinking too much, I will read the thread fully soon (Thanks for a very in depth post), would doing some sort of adapter/augmentor in python be a better approach for the future? A lot of the above is quite a hack w/r/t how python functions/methods work and the descriptor protocol that underpins them, I'd be a -1 with calling functions on the class and passing a seperate driver instance in to work around it, feels like the API is lackluster and we have some shortcomings we should address with a proper API just thinking of the future/maintenance of doing it like the above, will have a skim over the PR shortly. Thanks!

j3soon commented 1 year ago

@symonk, I agree with you that implementing a unified API (Augmenter) for all languages (Python, C#, etc.) is a much better approach in the long term. However, the Augmenter in Java is still in beta, and is not documented yet.

We can either:

Wait until the Augmenter in Java to be out of beta, design a proper unified API, and then implement it for all languages (including Python). Or
apply these python-specific hacks as beta/temporary features, and add some tests to prevent future regressions.

I also dislike using the (2) python-specific hacks, since it's only a short-term workaround, and may cause breaking changes when we decide to implement a proper API sometime in the future. Although PR #11500 shows that this workaround can be achieved very easily, it's only a proof-of-concept and I don't really think we should merge it.

The main motivation of this issue is due to the lack of documentation on local vs. remote webdrivers on these functionalities, which causes a lot of trouble for remote webdriver users. Fortunately, the documentation will be updated after PR https://github.com/SeleniumHQ/seleniumhq.github.io/pull/1267 is merged. So it's totally fine to close PR #11500 and (1) wait until the Augmenter in Java to be out of beta.

Meanwhile, if anyone want to use these functionalities on remote webdrivers, they can find this issue from the docs and copy the lengthy workaround for now.

titusfortner commented 1 year ago

There's not going to be a unified API, each language has its own ways of dealing with this, and the beta annotation of Java can likely be removed at this point.

Python might be better off following Ruby approach of using mixins to add methods to the class based on browser name.

I didn't see any tests in the PR to understand what the user API would look like with that code. It's ok to put some of the burden on the user, so long as everything actually works.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 280 days with no activity. Remove stale label or comment or this will be closed in 14 days.

titusfortner commented 10 months ago

Because of the (hacky) RemoteConnection code we have that pulls in subclass definitions, we currently can do:

    driver = webdriver.Remote(command_executor=server, options=options)

    with open(addon_path, "rb") as file:
        addon = base64.b64encode(file.read()).decode("UTF-8")

    payload = {"addon": addon, "temporary": False}
    driver.execute("INSTALL_ADDON", payload)

So what about creating an install_addon() method in a FirefoxFeatures() class and have webdriver.FirefoxOptions().install_addon() call that, and have remote drivers need to call it from FirefoxFeatures() directly?

@isaulv / @AutomatedTester / @symonk does this sound like a reasonable solution for this issue?

e.g.:

webdriver.firefox.FirefoxFeatures(driver).install_addon(path_to_addon)

titusfortner commented 10 months ago

The other piece is the aforementioned hacky implementation of RemoteConnection that knows about its subclasses. I don't think this should be automatic and that the user should be passing in the subclassed remote connection if they want to get the subclassed features... Maybe that needs to be part of the client config class I'm working on, but I think that's a separate issue.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 280 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been stalled for 14 days with no activity.

SeleniumHQ / selenium