asepaprianto / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Bound mismatch: The generic method start(Class<T>, int) of type CrawlController is not applicable for the arguments (Class<MyCrawler>, int). The inferred type MyCrawler is not a valid substitute for the bounded parameter <T extends WebCrawler> #203

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Executing the Program self
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using?
3.5

Please provide any additional information below.

At this Line The Error is being 
shown:-------->controller.start(MyCrawler.class, numberOfCrawlers);

Here is Controller Code:

import edu.uci.ics.crawler4j.crawler.CrawlConfig;
import edu.uci.ics.crawler4j.crawler.CrawlController;
import edu.uci.ics.crawler4j.fetcher.PageFetcher;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtServer;

public class Controller {
        public static void main(String[] args) throws Exception {
                String crawlStorageFolder = "/data/crawl/roo";
                int numberOfCrawlers = 7;

                CrawlConfig config = new CrawlConfig();
                config.setCrawlStorageFolder(crawlStorageFolder);

                /*
                 * Instantiate the controller for this crawl.
                 */
                PageFetcher pageFetcher = new PageFetcher(config);
                RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
                RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
                CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

                /*
                 * For each crawl, you need to add some seed urls. These are the first
                 * URLs that are fetched and then the crawler starts following links
                 * which are found in these pages
                 */
                //controller.addSeed("http://www.ics.uci.edu/~welling/");
                //controller.addSeed("http://www.ics.uci.edu/~lopes/");
                //controller.addSeed("http://www.ics.uci.edu/");

                controller.start(MyCrawler.class, numberOfCrawlers);
        }
}

Original issue reported on code.google.com by pavanrav...@gmail.com on 10 Mar 2013 at 6:58

GoogleCodeExporter commented 9 years ago
Can you share your MyCrawler code.

Original comment by ganjisaffar@gmail.com on 11 Mar 2013 at 5:06

GoogleCodeExporter commented 9 years ago
The MyCrawler code is nothing new. It is the same as code published on the 
Project Home Page. Although am pasting it here. Help me out.

public class MyCrawler extends WebCrawler {

        private final static Pattern FILTERS = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g" 
                                                          + "|png|tiff?|mid|mp2|mp3|mp4"
                                                          + "|wav|avi|mov|mpeg|ram|m4v|pdf" 
                                                          + "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

        /**
         * You should implement this function to specify whether
         * the given url should be crawled or not (based on your
         * crawling logic).
         */
        @Override
        public boolean shouldVisit(WebURL url) {
                String href = url.getURL().toLowerCase();
                return !FILTERS.matcher(href).matches() && href.startsWith("http://www.ics.uci.edu/");
        }

        /**
         * This function is called when a page is fetched and ready 
         * to be processed by your program.
         */
        @Override
        public void visit(Page page) {          
                String url = page.getWebURL().getURL();
                System.out.println("URL: " + url);

                if (page.getParseData() instanceof HtmlParseData) {
                        HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
                        String text = htmlParseData.getText();
                        String html = htmlParseData.getHtml();
                        List<WebURL> links = htmlParseData.getOutgoingUrls();

                        System.out.println("Text length: " + text.length());
                        System.out.println("Html length: " + html.length());
                        System.out.println("Number of outgoing links: " + links.size());
                }
        }
}

Original comment by pavanrav...@gmail.com on 11 Mar 2013 at 5:17

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I'm using Eclipse Helio.

Original comment by pavanrav...@gmail.com on 11 Mar 2013 at 5:20

GoogleCodeExporter commented 9 years ago
Can you zip your project folder and attach it here?

Original comment by ganjisaffar@gmail.com on 11 Mar 2013 at 5:21

GoogleCodeExporter commented 9 years ago
I also tried using BasicController where i suffered NoClassDefFound for 
PoolingConnection class in org.apache.bla.bla...

Here is the Project .rar File

Original comment by pavanrav...@gmail.com on 11 Mar 2013 at 5:25

Attachments:

GoogleCodeExporter commented 9 years ago
I see the following issues with your code:
 - You need to compile with Java 7 instead of Java 6.
 - You need to use the latest dependencies package instead of the old one. The latest one is here: https://code.google.com/p/crawler4j/downloads/detail?name=crawler4j-3.5-dependencies.zip#makechanges
 - You shouldn't have a copy of the WebCrawler class in your source folder. This is the source of the error you're getting.

Original comment by ganjisaffar@gmail.com on 11 Mar 2013 at 5:35

GoogleCodeExporter commented 9 years ago
Then Where should i place the WebCrawler class, if it is not Source
directory?

Original comment by pavanrav...@gmail.com on 11 Mar 2013 at 5:37

GoogleCodeExporter commented 9 years ago
Thank You a Lot its Working. Your spontaneous reply is impressive...

Original comment by pavanrav...@gmail.com on 11 Mar 2013 at 5:42

GoogleCodeExporter commented 9 years ago
Suffering another Exception Here are the details....

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/http/impl/conn/PoolingClientConnectionManager
    at edu.uci.ics.crawler4j.fetcher.PageFetcher.<init>(PageFetcher.java:101)
    at Controller.main(Controller.java:18)
Caused by: java.lang.ClassNotFoundException: 
org.apache.http.impl.conn.PoolingClientConnectionManager
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    ... 2 more

Original comment by pavanrav...@gmail.com on 11 Mar 2013 at 5:43

GoogleCodeExporter commented 9 years ago
Suffering another Exception Here are the details....

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/http/impl/conn/PoolingClientConnectionManager
    at edu.uci.ics.crawler4j.fetcher.PageFetcher.<init>(PageFetcher.java:101)
    at Controller.main(Controller.java:18)
Caused by: java.lang.ClassNotFoundException: 
org.apache.http.impl.conn.PoolingClientConnectionManager
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    ... 2 more

Original comment by pavanrav...@gmail.com on 11 Mar 2013 at 5:48

GoogleCodeExporter commented 9 years ago
Hii can u please help me . I am stuck up with the same issue .. The bound 
mismatch  for   controller.start  /// In that line of code .I f not in source 
folder where should I place my WebCrawler code? please  help me

Original comment by contactn...@gmail.com on 22 Jun 2014 at 4:42

GoogleCodeExporter commented 9 years ago
Hi,

The class 'WebCrawler' is already present in the JAR file that we are 
including...

Please delete if you have created any new copy of this class in your project. 
The copy you created is in conflict with the class present in JAR file.

Thank You,
Pavan Ravikanth.

Original comment by pavanrav...@gmail.com on 23 Jun 2014 at 3:25

GoogleCodeExporter commented 9 years ago
Hii .... Thank you very much that solved that issue. But when I run my program 
I am getting this error which I donot know how to solve . Can you please help 
me?

at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    at Controller.main(Controller.java:28)
How should I solve this issue pls help ...

Original comment by contactn...@gmail.com on 24 Jun 2014 at 12:13