-
```
child URL missing in Printf
```
Original issue reported on code.google.com by `ort...@gmail.com` on 20 Jul 2014 at 4:16
Attachments:
- [webcrawler.go.patch](https://storage.googleapis.com/google…
-
```
It doesn't reduce the depth when calling crawl recursively, so the the depth is
ineffective currently.
```
Original issue reported on code.google.com by `joelai85` on 26 Oct 2012 at 6:18
-
```
While crawling the seed http://eventiesagre.it/ I obtain the internal error
reported below.
I guess the issue is due the crawler finds a URL without a final / .
Processing page: [http://eventies…
-
```
What steps will reproduce the problem?
1. Just start a crawl for any site...
2.
3.
What is the expected output? What do you see instead?
process the page as per webcrawler process metho
What ve…
-
```
We should add better hooks in the WebCrawler in which we could better control
various errors while crawling a certain URL.
```
Original issue reported on code.google.com by `avrah...@gmail.com` …
-
```
What steps will reproduce the problem?
1. set max size to anything reasonable, im using 1MB
2. start crawling from
http://www.ics.uci.edu/~yil8/public_data/PyLOH/?C=S%3BO%3DA
3. watch console
Wh…
-
```
What steps will reproduce the problem?
java.lang.NullPointerException
at edu.uci.ics.crawler4j.frontier.DocIDServer.getDocID(DocIDServer.java:70)
at edu.uci.ics.crawler4j.crawler.WebCrawle…
-
```
What steps will reproduce the problem?
1. Take the simple crawler example; remove all calls to controller.addSeed()
and replace with this one
controller.addSeed("http://dairymix.com/");
2. This …
-
```
PageFetcher.Fetch(Page page) is currently being used by all crawler threads as
a utility class, it has become a bottleneck. Instead why dont you put an
instance of PageFetcher as an instance v…
-
```
What steps will reproduce the problem?
1.Run crawler for domain with has robots.txt file with 'allow:' instruction
(for example http://www.explido-webmarketing.de/)
What is the expected output?…