bejean / crawl-anywhere

Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
www.crawl-anywhere.com
Apache License 2.0
96 stars 38 forks source link

proxy params are ignored #44

Closed torhar closed 9 years ago

torhar commented 11 years ago

if proxy params are defined in crawler.xml, the params are only beeing used to create auth-cookies in initiliaze-method of class WebConnector.java, but not beeing injected to class WebPageLoader.java and further to HttpLoader.java to guarantee generell access through proxy without any auth mechanism. This means that proxy setting are ignored.

bejean commented 9 years ago

Not supporting address exclusion list (need http client library upgrade)

see : Proxy address exclusion list #76