Closed GoogleCodeExporter closed 9 years ago
Can you share your MyCrawler code.
Original comment by ganjisaffar@gmail.com
on 11 Mar 2013 at 5:06
The MyCrawler code is nothing new. It is the same as code published on the
Project Home Page. Although am pasting it here. Help me out.
public class MyCrawler extends WebCrawler {
private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");
/**
* You should implement this function to specify whether
* the given url should be crawled or not (based on your
* crawling logic).
*/
@Override
public boolean shouldVisit(WebURL url) {
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches() && href.startsWith("http://www.ics.uci.edu/");
}
/**
* This function is called when a page is fetched and ready
* to be processed by your program.
*/
@Override
public void visit(Page page) {
String url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String text = htmlParseData.getText();
String html = htmlParseData.getHtml();
List<WebURL> links = htmlParseData.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
}
}
}
Original comment by pavanrav...@gmail.com
on 11 Mar 2013 at 5:17
[deleted comment]
I'm using Eclipse Helio.
Original comment by pavanrav...@gmail.com
on 11 Mar 2013 at 5:20
Can you zip your project folder and attach it here?
Original comment by ganjisaffar@gmail.com
on 11 Mar 2013 at 5:21
I also tried using BasicController where i suffered NoClassDefFound for
PoolingConnection class in org.apache.bla.bla...
Here is the Project .rar File
Original comment by pavanrav...@gmail.com
on 11 Mar 2013 at 5:25
Attachments:
I see the following issues with your code:
- You need to compile with Java 7 instead of Java 6.
- You need to use the latest dependencies package instead of the old one. The latest one is here: https://code.google.com/p/crawler4j/downloads/detail?name=crawler4j-3.5-dependencies.zip#makechanges
- You shouldn't have a copy of the WebCrawler class in your source folder. This is the source of the error you're getting.
Original comment by ganjisaffar@gmail.com
on 11 Mar 2013 at 5:35
Then Where should i place the WebCrawler class, if it is not Source
directory?
Original comment by pavanrav...@gmail.com
on 11 Mar 2013 at 5:37
Thank You a Lot its Working. Your spontaneous reply is impressive...
Original comment by pavanrav...@gmail.com
on 11 Mar 2013 at 5:42
Suffering another Exception Here are the details....
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/http/impl/conn/PoolingClientConnectionManager
at edu.uci.ics.crawler4j.fetcher.PageFetcher.<init>(PageFetcher.java:101)
at Controller.main(Controller.java:18)
Caused by: java.lang.ClassNotFoundException:
org.apache.http.impl.conn.PoolingClientConnectionManager
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 2 more
Original comment by pavanrav...@gmail.com
on 11 Mar 2013 at 5:43
Suffering another Exception Here are the details....
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/http/impl/conn/PoolingClientConnectionManager
at edu.uci.ics.crawler4j.fetcher.PageFetcher.<init>(PageFetcher.java:101)
at Controller.main(Controller.java:18)
Caused by: java.lang.ClassNotFoundException:
org.apache.http.impl.conn.PoolingClientConnectionManager
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 2 more
Original comment by pavanrav...@gmail.com
on 11 Mar 2013 at 5:48
Hii can u please help me . I am stuck up with the same issue .. The bound
mismatch for controller.start /// In that line of code .I f not in source
folder where should I place my WebCrawler code? please help me
Original comment by contactn...@gmail.com
on 22 Jun 2014 at 4:42
Hi,
The class 'WebCrawler' is already present in the JAR file that we are
including...
Please delete if you have created any new copy of this class in your project.
The copy you created is in conflict with the class present in JAR file.
Thank You,
Pavan Ravikanth.
Original comment by pavanrav...@gmail.com
on 23 Jun 2014 at 3:25
Hii .... Thank you very much that solved that issue. But when I run my program
I am getting this error which I donot know how to solve . Can you please help
me?
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at Controller.main(Controller.java:28)
How should I solve this issue pls help ...
Original comment by contactn...@gmail.com
on 24 Jun 2014 at 12:13
Original issue reported on code.google.com by
pavanrav...@gmail.com
on 10 Mar 2013 at 6:58