What steps will reproduce the problem?
1. Set proxy settings in CrawlConfig
2. Add BasicAuthInfo to CrawlConfig
3. Try to crawl a site with basic authentication
What is the expected output? What do you see instead?
The crawler should crawl the URL and fetch the data.
But this is not possible, because the crawler can´t connect.
What version of the product are you using?
4.0
Please provide any additional information below.
The code in PageFetcher.java must be changed.
Currently proxy information (and maybe other informations) get lost when
performing basic authentication.
In method PageFetcher.doBasicLogin(BasicAuthInfo authInfo) a new HttpClient is
created.
/**
* BASIC authentication<br/>
* Official Example:
* https://hc.apache.org/httpcomponents-client-ga/httpclient/examples/org/apache/http/examples/client/ClientAuthentication
* .java
* */
protected void doBasicLogin(BasicAuthInfo authInfo) {
HttpHost targetHost = new HttpHost(authInfo.getHost(), authInfo.getPort(), authInfo.getProtocol());
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(new AuthScope(targetHost.getHostName(), targetHost.getPort()),
new UsernamePasswordCredentials(authInfo.getUsername(), authInfo.getPassword()));
httpClient = HttpClients.custom().setDefaultCredentialsProvider(credsProvider).build();
}
Original issue reported on code.google.com by wefwefw...@gmail.com on 6 Jan 2015 at 11:19
Original issue reported on code.google.com by
wefwefw...@gmail.com
on 6 Jan 2015 at 11:19