What steps will reproduce the problem?
1. Create a job with Quartz
2. Insert the crawler4j controller in the Quartz execute() method
3. Execute the job
Hi,
I'm trying to combine Quartz scheduler with crawler4j.
The problem is that when I execute the C4J code in a main method it works well,
but in the quartz Job execute() method, there is a Http connection error.
We are working behind a proxy but it's already configured winthin C4j and we
even tried in Quartz.
Do you know if Quartz can block the Http Connection ?
Error Stacktrace :
Exception in thread "Crawler 1" java.lang.NoSuchFieldError: DEF_PROTOCOL_CHARSET
at org.apache.http.auth.params.AuthParams.getCredentialCharset(AuthParams.java:64)
at org.apache.http.impl.auth.BasicScheme.authenticate(BasicScheme.java:157)
at org.apache.http.client.protocol.RequestAuthenticationBase.authenticate(RequestAuthenticationBase.java:125)
at org.apache.http.client.protocol.RequestAuthenticationBase.process(RequestAuthenticationBase.java:83)
at org.apache.http.client.protocol.RequestProxyAuthentication.process(RequestProxyAuthentication.java:89)
at org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:108)
at org.apache.http.protocol.HttpRequestExecutor.preProcess(HttpRequestExecutor.java:174)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:515)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at edu.uci.ics.crawler4j.fetcher.PageFetcher.fetchHeader(PageFetcher.java:156)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:232)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:189)
at java.lang.Thread.run(Thread.java:662)
The execute() method :
@Override
public void execute(JobExecutionContext context)
throws JobExecutionException {
JobKey key = context.getJobDetail().getKey();
JobDataMap dataMap = context.getJobDetail().getJobDataMap();
String[] sitesTab = dataMap.getString("sites").split(";");
int numberOfCrawlers = 2;
String storageFolder = "C:\\...";
CrawlConfig config = new CrawlConfig();
config.setProxyHost("...");
config.setProxyPort(3128);
config.setProxyUsername("...");
config.setProxyPassword("...");
config.setMaxDepthOfCrawling(2);
config.setCrawlStorageFolder(storageFolder);
config.setIncludeBinaryContentInCrawling(true);
String[] crawlDomains = new String[] { "http://www.....fr/" };
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
robotstxtConfig.setEnabled(false);
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig,
pageFetcher);
CrawlController controller;
try {
controller = new CrawlController(config, pageFetcher,
robotstxtServer);
for (String domain : crawlDomains) {
controller.addSeed(domain);
}
int minWidth = 150;
int minHeight = 150;
Pattern p = Pattern.compile(".*(\\.(bmp|gif|jpe?g|png))$");
SportifsWebCrawler.configure(crawlDomains, storageFolder, p,
minWidth, minHeight);
controller.start(SportifsWebCrawler.class, numberOfCrawlers);
} catch (Exception e) {
e.printStackTrace();
}
}
Thanks for helping :)
Original issue reported on code.google.com by stratege...@gmail.com on 5 Feb 2014 at 1:30
Original issue reported on code.google.com by
stratege...@gmail.com
on 5 Feb 2014 at 1:30