-
I tried to crawl with the same configuration for second time, when trying to do that I get that exception `Execution failed for job: Crawler HTTP Collector
com.norconex.jef4.job.JobException: 2 out of…
-
Hi,
I have follow info in my log:
INFO [CrawlerEventManager] REJECTED_TOO_DEEP: https://en.wikipedia.org/wiki/Spanish_conquest_of_Yucat%C3%A1n
But I set: 5
The depth in info line before is …
-
Hello,
I have follow configuration (file is attached)
[config.txt](https://github.com/Norconex/collector-http/files/335445/config.txt)
Now, I run the job, and as result see:
INFO [CrawlerEventMana…
-
I've:
- downloaded latest code from: http://www.norconex.com/collectors/collector-http/download
- unzipped the file
- gone to the root of the unzipped dir
- run: . ./collector-http.sh -a start -c exam…
-
I just ran the following simple crawler:
```
./tests-output/testattribute/progress
./tests-output/testattribute/logs
http://avax.news/fact/The_Day_in_Photos_Ju…
-
I make it get into an infinite loop, with these rules:
``` xml
jpg,gif,png,ico,css,js,gz,bz,tgz
http://.*nz/.*
http://.*nz
```
This is the log file:
``` xml
[niko@dev1 norconex-collector-…
-
Hi,
I'm using norconex-collector-http-2.5.1 with lib/norconex-importer-2.6.0-SNAPSHOT.jar
In the crawler configuration file below, you'll find that DESCRIPTION and DESCRIPTION-TEST3 have _exactly_ t…
-
Hi again,
**description of the problem**
I'm trying to extract the people from pages like:
http://finder.startupnationcentral.org/c/polymertal
For this purpose, I'm using the following code:
```
…
-
Hi Pascal,
I'm doing a little project with norconex http collector which will fetch news that with my city in the keywords field of metadata from big news website .
The fetching works well but the …
-
Hi,
I've used Heritrix for a while, so I understand how to crawl websites. But since I'm not satisfied with Heritrix, I'm currently looking at alternative.
Norconex's API docs are good and the XML c…