-
collector.referrer-link-text can be extracted correctly, but it does not work for collector.referrer-link-title
-
We are evaluating Norconex HTTP Collector as a replacement for a custom-built web crawler. One of the domains that we would want to crawl is mascus.com who provide a few dozen sitemaps all referenced …
niels updated
8 years ago
-
During launch of the collector I got following messages in log:
```
Unsupported HTTP response HTTP/1.1 301 Moved Permanently
REJECTED_REDIRECTED: http://www.site.com (Subject: HttpFetchResponse [craw…
-
At the beginning of the collector run, in the console, I get the information shown below, related to filters and modules versions:
```
INFO [AbstractCrawlerConfig] Reference filter loaded: com.norc…
-
Hi Norconex team,
could you please take a look at the following issue:
Input CSV (exported from an excel file):
```
id;title;text
1;"Epoch & Unix Timestamp Conversion Tools ""Time converter""";"Con…
-
hi Pascal,
first of all, I'd like to thank you and your team for the developing a new free crawler!
Since many years we've been trying to find an alternative solution for the Autonomy http connector/f…
-
I'd like to know is it possible to use CDATA or something similar in regex filters to shield `&` and other XML language parts may met in urls?
```
someurl?param1=1¶m2=2¶m3=.*$
```
Replacing…
-
Given a redirect from http://www.mascus.com/agriculture/used-other-tractor-accessories/%D0%B3%D1%96%D0%B4%D1%80%D0%B0%D0%B2%D0%BB%D1%96%D0%BA%D0%B0-%D1%81%D0%BF%D0%B5%D1%86%D1%82%D0%B5%D1%85%D0%BD%D1%…
niels updated
8 years ago
-
Hi
Http collector works well for pages with utf-8 encoding, but for pages with other charset, like 'gb2312'. the results are in confusion code. Could you please let me know how can I resolve this pro…
-
Dear norconex team,
could you please clarify, why the http collector does check the robots.txt of the "remote" (or "external") sites, although it is configured to "stay-on-site", e.g.
```
h…