Open luorixiangyang opened 1 year ago
Looks like there is something missing ;-) will have a deeper look
As a workaround can you please try something like
String PROXY_HOST = ....;
int PROXY_PORT = .....
WebDriver webDriver = new HtmlUnitDriver(BrowserVersion.FIREFOX, true) {
@Override
protected WebClient modifyWebClient(WebClient client) {
final WebClient webClient = super.modifyWebClient(client);
webClient.getOptions().setProxyConfig(new ProxyConfig(PROXY_HOST, PROXY_PORT, null));
final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
credentialsProvider.addCredentials("username", "password", PROXY_HOST, PROXY_PORT);
return webClient;
}
};
As a workaround can you please try something like
String PROXY_HOST = ....; int PROXY_PORT = ..... WebDriver webDriver = new HtmlUnitDriver(BrowserVersion.FIREFOX, true) { @Override protected WebClient modifyWebClient(WebClient client) { final WebClient webClient = super.modifyWebClient(client); webClient.getOptions().setProxyConfig(new ProxyConfig(PROXY_HOST, PROXY_PORT, null)); final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider(); credentialsProvider.addCredentials("username", "password", PROXY_HOST, PROXY_PORT); return webClient; } };
Here is the detail infos: pom.xml dependency like below: ...
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>htmlunit-driver</artifactId>
<version>4.10.0</version>
</dependency>
...
source code like below: public static WebDriver createProxyWebDriver() { String PROXY_HOST = ProxyHost; int PROXY_PORT = ProxyPort;
// config webDriver with proxies
WebDriver webDriver = new HtmlUnitDriver(BrowserVersion.FIREFOX, true) {
@Override
protected WebClient modifyWebClient(WebClient client) {
final WebClient webClient = super.modifyWebClient(client);
webClient.getOptions().setProxyConfig(new ProxyConfig(PROXY_HOST, PROXY_PORT, null));
final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient
.getCredentialsProvider();
credentialsProvider.addCredentials(ProxyUser, ProxyPass, PROXY_HOST, PROXY_PORT, null);
return webClient;
}
};
return webDriver;
}
public static String getPageOnDynamicWeb(String url) { WebDriver client = createProxyWebDriver(); client.get(url); String response = client.getPageSource(); client.close(); return response; }
public static void main(String[] args) throws Exception { String response = ""; String url = "https://developer.apple.com/documentation/accelerate/bnns/shape/3656199-init"; // the target url response = getPageOnDynamicWeb(url); ClearInnerToWriteFile( "/home/luori/_fly/workspaces/javaworkspace/selenium-base/logs/apple_api_page_html.html", response); }
Run before code will take exception like below: ...... Caused by: net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: invalid property id (https://developer.apple.com/tutorials/js/chunk-vendors.fc64ed7e.js#10) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory$HtmlUnitErrorReporter.error(HtmlUnitContextFactory.java:435) at net.sourceforge.htmlunit.corejs.javascript.Parser.addError(Parser.java:257) at net.sourceforge.htmlunit.corejs.javascript.Parser.reportError(Parser.java:336) at net.sourceforge.htmlunit.corejs.javascript.Parser.reportError(Parser.java:327) at net.sourceforge.htmlunit.corejs.javascript.Parser.reportError(Parser.java:320) at net.sourceforge.htmlunit.corejs.javascript.Parser.objectLiteral(Parser.java:3499) ......
Environment: Ubuntu-server 18.04 google-chrome: Google Chrome 114.0.5735.133 ChromeDriver:114.0.5735.90 JDK:1.8.0_271
Please try target url :https://developer.apple.com/documentation/accelerate/bnns/shape/3656199-init to test the correct approach .
Thanks!
I need to point out :(https://developer.apple.com/documentation/accelerate/bnns/shape/3656199-init) is dynamic web content, need excute javascript file on scrape process. I can get the static web content but can't catch the dynamic parts.
Had a deeper look and there are several problems with this page. Long story short - HtmlUnit does not support the whole modern javascript syntay (because it is based on Rhino). We are working on improving this but i fear there is no real progress until the end of this year.
Two options: help us to improve Rhino or use selenium with real browsers
Had a deeper look and there are several problems with this page. Long story short - HtmlUnit does not support the whole modern javascript syntay (because it is based on Rhino). We are working on improving this but i fear there is no real progress until the end of this year.
Two options: help us to improve Rhino or use selenium with real browsers
Got it! I also check if i can make contribution on HtmlUitl to improve this issue.
How to setting proxy authorization with username/passwd in ubuntu-server 18.04 env? I found lots of example but dont reslove my requirement to scrape the web like : (https://developer.apple.com/documentation/accelerate/bnns/shape/3656199-init)
thanks!