SeleniumHQ / htmlunit-driver

WebDriver compatible driver for HtmlUnit headless browser.
Apache License 2.0
256 stars 86 forks source link

how to start selenium-server with htmlunit? #120

Closed velkyvont closed 3 months ago

velkyvont commented 2 years ago

Hi, I want to use htmlunit driver for webscrawling, now I'm using chromedriver via selenium which takes a lot of CPU. Since my scripts are in Python and my knowledge of java is 0% I figured after some research I must start it via selenium-server remote connection. But I spent a lot of time trying to figure out this out but whatever I tried it just didn't work. Could someone please help?

yamajun commented 2 years ago

I can use htmlunit-driver with OLD selenium-server.

java -cp selenium-server-standalone-3.141.59.jar:htmlunit-driver-2.64.0-jar-with-dependencies.jar org.openqa.grid.selenium.GridLauncherV3

(Note: "org.openqa.grid.selenium.GridLauncherV3" defined as "Main-Class" on META-INF/MANIFEST.MF)

Sample test written in Python:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

selenium_hub_url = "http://localhost:4444/wd/hub"
htmlunit_capabilities = DesiredCapabilities.HTMLUNITWITHJS.copy()

driver = webdriver.Remote(
    command_executor = selenium_hub_url,
    desired_capabilities = htmlunit_capabilities
)
driver.get("https://example.com")
print("Title: " + driver.title)
assert "Example Domain" in driver.title
driver.quit()
pip install selenium=3.141.0
python test.py

Unfortunately, I can't use htmlunit-driver with selenium-4.x.y.

java -jar selenium-server-4.4.0.jar --ext htmlunit-driver-3.64.0-jar-with-dependencies.jar standalone -I htmlunit or java -cp selenium-server-4.4.0.jar:htmlunit-driver-3.64.0-jar-with-dependencies.jar org.openqa.selenium.grid.Bootstrap standalone -I htmlunit

02:01:02.290 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding 02:01:02.298 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing 02:01:03.302 INFO [NodeOptions.getSessionFactories] - Detected 8 available processors 02:01:03.362 INFO [NodeOptions.discoverDrivers] - Discovered 3 driver(s) 02:01:03.391 WARN [NodeOptions.lambda$addSpecificDrivers$20] - Could not find htmlunit driver on PATH. java.lang.reflect.InvocationTargetException (...snip...) Caused by: org.openqa.selenium.grid.config.ConfigException: No drivers were found for [htmlunit] at org.openqa.selenium.grid.node.config.NodeOptions.addSpecificDrivers(NodeOptions.java:477) at org.openqa.selenium.grid.node.config.NodeOptions.getSessionFactories(NodeOptions.java:211) at org.openqa.selenium.grid.node.local.LocalNodeFactory.create(LocalNodeFactory.java:79) ... 22 more

sbabcoc commented 1 year ago

I think the issue is that there's no implementation of WebDriverInfo to provide the information that NodeOptions needs to match htmlunit.

sbabcoc commented 4 months ago

I've created the HtmlUnit Remote project to resolve this issue.

rbri commented 3 months ago

Yes we now have full grid support - THANKS @sbabcoc!

sbabcoc commented 3 months ago

@velkyvont You can find details here: https://github.com/sbabcoc/htmlunit-remote

nicolas-mng commented 1 month ago

This is pretty cool, thanks! I successfully have my selenium server running HtmlUnit to be used by my Python app, but how do you pass extra information such as throwExceptionOnScriptError and enableJavacript? I have tried to pass this under stereotype in htmlunit.toml but it does not seem to be taken into account by the server. Passing this under a Selenium Options does not work either since it is not W3C compliant.