-
I just take the same [test page](https://herimedia.com/norconex-test.html) from the issue #202
config:
``` xml
Date,Content-Type
…
-
At the beginning of the collector run, in the console, I get the information shown below, related to filters and modules versions:
```
INFO [AbstractCrawlerConfig] Reference filter loaded: com.norc…
-
Hi Norconex team,
could you please take a look at the following issue:
Input CSV (exported from an excel file):
```
id;title;text
1;"Epoch & Unix Timestamp Conversion Tools ""Time converter""";"Con…
-
Given a redirect from http://www.mascus.com/agriculture/used-other-tractor-accessories/%D0%B3%D1%96%D0%B4%D1%80%D0%B0%D0%B2%D0%BB%D1%96%D0%BA%D0%B0-%D1%81%D0%BF%D0%B5%D1%86%D1%82%D0%B5%D1%85%D0%BD%D1%…
niels updated
8 years ago
-
Thank you for the new version. Going to test it immediately.
Was wondering how to go about when I wanted to keepDownloads.
Is this option viable for scenario where you keep a copy of the website on …
-
Dear norconex team,
could you please clarify, why the http collector does check the robots.txt of the "remote" (or "external") sites, although it is configured to "stay-on-site", e.g.
```
h…
-
First let me thank you for this wonderful piece of software!
I am using 2.3.0-SNAPSHOT and would like to avoid duplicate pages like http://example.com and http://example.com/.
So I tried to configur…
-
Crawling a flickr site, say, `https://www.flickr.com/photos/gobiernoextremadura` with:
```
https://www\.flickr\.com/photos/gobiernoextremadura/.*
```
I only get 3 documents:
```
…
-
Here you can see some exceptions I got:
```
MC (crawler): 2015-08-05 17:57:10 WARN - Could not queue extracted URL "http://www.feccoo-extremadura.org/ensenanzaextremadura/Areas_Comunes:Salud_Laboral_…
-
I found a random behavior in the Committer. Running a crawl against the same url will give different results. Here is a section of output when the behavior is correct
```
INFO - AbstractCrawler …